Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Size: px
Start display at page:

Download "Anomaly (outlier) detection. Huiping Cao, Anomaly 1"

Transcription

1 Anomaly (outlier) detection Huiping Cao, Anomaly 1

2 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping Cao, Anomaly 2

3 What are outliers The set of data points that are significantly different from the rest of the objects Assumption There are considerably more normal observations than abnormal observations (outliers/anomalies) in the data Applications Fraud detection (credit card usage) Intrusion detection (computer systems, computer networks) Ecosystem disturbances Public health Medicine Related Novelty detection Huiping Cao, Anomaly 3

4 Types of outliers Global: deviate significantly from the rest of the dataset Also called point anomalies Most outlier detection methods are designed to find such outliers Example Intrusion detection in network traffic Huiping Cao, Anomaly 4

5 Types of outliers Contextual (conditional) outliers An object is an outlier in one context, but may be normal in another context Contextual attributes: define the object s context. date, location Behavior attributes: define the object s characteristics, and are used to evaluate whether the object is an outlier in the context. temperature A generalization of local outlier, defined in density based analysis. Background information to determine contextual attributes, etc. Huiping Cao, Anomaly 5

6 Types of outliers Collective: a subset of data objects forms a collective outlier if the objects as a whole deviate significantly from the entire data set The individual data objects may not be outliers Applications: supply-chain, web visiting, network (denialof-service) Need background information to make object relationships Huiping Cao, Anomaly 6

7 Causes of anomalies Data from different classes Hawkins definition of an Outlier: an outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by a different mechanism. Natural variation Anomalies that represent extreme or unlikely variations (extreme tall person) Data measurement and collection errors Removing such anomalies is the focus of data preprocessing (data cleaning) Others: several sources Huiping Cao, Anomaly 7

8 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping Cao, Anomaly 8

9 Challenges of outlier detection Model normal/outlier objects Hard to model complete normal behavior Some methods assign normal or abnormal Some methods assign a score measuring the outlier-ness of the object. Universal outlier detection: hard to develop Similarity and distance definition is application-dependent Common issues: noise Understandability Understand why the detected objects are outliers Provide justification of the detection Huiping Cao, Anomaly 9

10 Outline General concepts What are outliers Types of outliers Challenges of outlier detection Outlier detection approaches Statistical methods Proximity-based methods Clustering-based methods Huiping Cao, Anomaly 10

11 Outlier detection methods Data for analysis are labeled with normal or abnormal by domain experts. Supervised methods Can be modeled as a classification problem Special aspects to consider: imbalanced normal data points and abnormal points Measures: recall is more meaningful Unsupervised methods Largely utilize clustering methods Semi-supervised Huiping Cao, Anomaly 11

12 Outlier detection methods Outlier detection algorithms make assumptions about outliers versus the rest of the data. Categories according to the assumptions made Statistical methods (or model based) Normal data follow a statistical (stochastic) model Outliers do not follow the model Proximity-based methods The proximity of outliers to their neighbors are different from the proximity of most other objects to their neighbors Distance-based, density-based Clustering-based methods Normal objects belong to large and dense clusters Outliers belong to small or sparse clusters, or belong to no cluster Huiping Cao, Anomaly 12

13 Statistical approaches Probabilistic definition of an outlier: an outlier is an object that has a low probability with respect to a probability distribution model of the data. Normal objects are generated by a stochastic process, occur in regions of high probability for the stochastic model Outliers occur in regions of low probability Approach steps Learn a generative model fitting the given data Identify the objects in low-probability regions of the model Categories Parametric method (univariate, multivariate) Nonparametric method Huiping Cao, Anomaly 13

14 Parametric: univariate Normal Distribution Normal distribution, maximum likelihood estimation (MLE) Standard normal distribution, N(0,1) Non-standard normal distribution, N(μ,σ 2 ), z-score Use MLE to estimate μ and σ 2 Huiping Cao, Anomaly 14

15 Parametric: univariate Normal Distribution prob( x c) = α for N(0,1) Mark an object as an outlier if it is more than 3σ away from the estimated mean μ, where σ is the standard deviation (μ±3σ region contains 99.73% of the data) (c, α) pair for N(0,1) c α for N(0,1) Huiping Cao, Anomaly 15

16 Parametric: univariate Normal Distribution Example A city s average temperature values in 10 years: 24, 28.9, 28.9, 29, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4 μ = σ , σ = sqrt(2.29) = 1.51 Is 24 an outlier? z-score = ( )/1.51 = 3.04 > 3 Huiping Cao, Anomaly 16

17 Parametric: other univariate outlier detection approaches (S.S.) Boxplot method Grubb s test (maximum normed residual test) Huiping Cao, Anomaly 17

18 Parametric: multivariate Multivariate Convert the problem to a univariate outlier detection problem Use Mahalanobis distance from object o to its mean μ Use χ 2 statistic o i : is the value of o on the i-th dimension E i : the mean of the i-th dimension of all objects n: the number of object Huiping Cao, Anomaly 18

19 Nonparametric Nonparametric methods use fewer assumptions about data distribution, thus can be applicable in more scenarios Histogram approach Construct histograms (types: equal width or equal depth, number of bins, or size of each bin) Outliers: not in any bin or in bins with small size Drawback: hard to decide the bin size Others: kernel function (more discussed in machine learning) Huiping Cao, Anomaly 19

20 Outline General concepts What are outliers Types of outliers Challenges of outlier detection Outlier detection approaches Statistical methods Proximity-based methods Clustering-based methods Huiping Cao, Anomaly 20

21 Proximity-based Approaches Data is represented as a vector of features Based on the neighborhood Major approaches Distance based Density based Huiping Cao, Anomaly 21

22 Distance-based approach Anomaly: if an object is distant from most points. Distance to k-nearest Neighbor: the outlier score of an object is given by the distance to its k-nearest neighbor. Outliers: threshold Problem: hard to decide k (see next slides) Improvement: average of the distances to the first k-nearest neighbors Huiping Cao, Anomaly 22

23 k=1, outlier is O k=1, outlier is O k=5, all points at the right upper corner are outliers 23

24 Distance-based outlier detection Given a dataset D with n data points, a distance threshold r r-neighborhood: about outliers vs. the rest of the data Object o is a DB(r,π)-outlier Approach: Compute the distance between every pair of data points O(n 2 ) Practically, O(n) Huiping Cao, Anomaly 24

25 A grid-based method implementation Cell diagonal length: r/2 Cell edge length: where d is the number of dimensions Level-1 cell Direct neighbor cells of a cell C r 2 d Any point o in such cells has dist(o,o ) r Level-2 cell One or two cells away from a cell C Any point with dist(o,o ) > r must be in level-2 cell Huiping Cao, Anomaly 25

26 A grid-based method implementation Pruning n 0 total number of objects in a cell C n 1 total number of objects in a cell C s level-1 cells n 2 total number of objects in a cell C s level-2 cells Level-1 cell pruning: If (n 0 +n 1 ) > πn, o is NOT an outlier Level-2 cell: If (n 0 +n 1 +n 2 ) < πn+1, all the points in C are outliers Huiping Cao, Anomaly 26

27 Distance-based outlier detection Global outliers: cannot handle data sets with regions of different densities p 2 p 1 Huiping Cao, Anomaly 27

28 Proximity-based Approaches Data is represented as a vector of features Based on the neighborhood Major approaches Distance based Density based Huiping Cao, Anomaly 28

29 Density-based outlier detection Local proximity-based outlier Compare the density around one object with the density around its local neighbors p 2 p 1 Huiping Cao, Anomaly 29

30 Density based D: a set of objects Nearest neighbor of o d(o,d) = min{d(o,o ) o in C} Local outliers: relative to their local neighborhoods, particularly with respect to the densities of the neighborhoods. Density based outlier: the outlier score of an object is the inverse of the density around an object. Huiping Cao, Anomaly 30

31 Concepts k-distance of an object o d k (o): measure the relative density of an object o. Formally, d k (o) = d(o,p) s.t. at least k objects o in D/{o}, d(o,o`) d(o,p) at least k-1 objects o in D/{o}, d(o,o`)<d(o,p) K-distance neighborhood of an object o N k (o) = {o o in D, d(o,o ) d k (o)} N k (o) may contain more than k objects Measure local density: average distance from o to N k (o) Problem: fluctuations Huiping Cao, Anomaly 31

32 Reachable distance Concepts reachdist(o ào) = max{d k (o), d(o,o )} Alleviate fluctuations Not symmetric, reachdist(o ào) reachdist(oào ) Local density of o: average reachability distance from o to N k (o) density k (o) = o' N k (o) N k (o) reachdist(o o') = o' N k (o) N k (o) max{d k (o'), d(o, o')} Different from density definition in density-based clustering Global/local Huiping Cao, Anomaly 32

33 Example 5 k=2, use Euclidean distance Distance from o to o s 2NN is 1 d k (o)=1 4 N k (o)={p1,p2,p3} d k (p1) = sqrt( ) = 1.28, dist(o,p1)=0.8 d k (p2) = sqrt(2) =1.41, dist(o,p2)=1 y 3 2 p1 O p3 d k (p3) = sqrt(0.32) = 0.57, dist(o,p3)=1 reachdist(o->p1) = p2 reachdist(o->p2) = 1.41 reachdist(o->p3) = density k (o)=3/( ) = x Huiping Cao, Anomaly 33

34 Local outlier factor (LOF) (or average relative density of o) Average ratio of local reachability density of o and local reachability density of the k-nearest neighbors of o The lower density k (o), and the higher density k (o )è higher LOFà higher probability to be outlier Huiping Cao, Anomaly 34

35 Example 5 k=2, use Euclidean distance Distance from o to o s 2NN is 1 d k (o)=1 4 N k (o)={p1,p2,p3} d k (p1) = sqrt( ) = 1.28, dist(o,p1)=0.8 d k (p2) = sqrt(2) =1.41, dist(o,p2)=1 y 3 2 p1 O p3 d k (p3) = sqrt(0.32) = 0.57, dist(o,p3)=1 reachdist(o->p1) = p2 reachdist(o->p2) = 1.41 reachdist(o->p3) = density k (o)=3/( ) = x Then, calculate density k (p1), density k (p2), density k (p3) Huiping Cao, Anomaly 35

36 Outline General concepts What are outliers Types of outliers Challenges of outlier detection Outlier detection approaches Statistical methods Proximity-based methods Clustering-based methods Huiping Cao, Anomaly 36

37 Clustering-Based Clustering-based outlier: an object is a cluster-based outlier if the object does not strongly belong to any cluster. An outlier an object belonging to a small and remote cluster or not belonging to any cluster Huiping Cao, Anomaly 37

38 Clustering-Based Basic steps: Cluster the data into groups of different density Three general approaches An object does not belong to any cluster à outlier object There is a large distance between an object and the cluster to which it is closest à outlier The object is part of a small and sparse cluster à all the objects in that cluster are outliers Huiping Cao, Anomaly 38

39 Approach 2 There is a large distance between an object and the cluster to which it is closest à outlier Calculate ratio, the larger the ratio, the farther away o is from its closest cluster C o ratio = d(o,c o ) d(o',c o ) o' Co C o Huiping Cao, Anomaly 39

40 Outliers in Lower Dimensional Projection In high-dimensional space, data is sparse and notion of proximity becomes meaningless Every point is an almost equally good outlier from the perspective of proximity-based definitions Lower-dimensional projection methods A point is an outlier if in some lower dimensional projection, it is present in a local region of abnormally low density Huiping Cao, Anomaly 40

41 R packages R parallel implementation of Local Outlier Factor(LOF) which uses multiplecpus to significantly speed up the LOF computation for large datasets. Python LOF implementation: Huiping Cao, Anomaly 41

An Overview of Outlier Detection Techniques and Applications

An Overview of Outlier Detection Techniques and Applications Machine Learning Rhein-Neckar Meetup An Overview of Outlier Detection Techniques and Applications Ying Gu connygy@gmail.com 28.02.2016 Anomaly/Outlier Detection What are anomalies/outliers? The set of

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Outlier Detection Using Rough Set Theory

Outlier Detection Using Rough Set Theory Outlier Detection Using Rough Set Theory Feng Jiang 1,2, Yuefei Sui 1, and Cungen Cao 1 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences,

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Logic and machine learning review. CS 540 Yingyu Liang

Logic and machine learning review. CS 540 Yingyu Liang Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.

More information

Anomaly Detection via Online Oversampling Principal Component Analysis

Anomaly Detection via Online Oversampling Principal Component Analysis Anomaly Detection via Online Oversampling Principal Component Analysis R.Sundara Nagaraj 1, C.Anitha 2 and Mrs.K.K.Kavitha 3 1 PG Scholar (M.Phil-CS), Selvamm Art Science College (Autonomous), Namakkal,

More information

Università di Pisa A.A Data Mining II June 13th, < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D} > t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7

Università di Pisa A.A Data Mining II June 13th, < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D} > t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 Università di Pisa A.A. 2016-2017 Data Mining II June 13th, 2017 Exercise 1 - Sequential patterns (6 points) a) (3 points) Given the following input sequence < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D}

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Probabilistic Methods in Bioinformatics. Pabitra Mitra

Probabilistic Methods in Bioinformatics. Pabitra Mitra Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive

More information

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. I.Priyanka 1. Research Scholar,Bharathiyar University. G.

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. I.Priyanka 1. Research Scholar,Bharathiyar University. G. Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 Reverse

More information

MASTER. Anomaly detection on event logs an unsupervised algorithm on ixr-messages. Severins, J.D. Award date: Link to publication

MASTER. Anomaly detection on event logs an unsupervised algorithm on ixr-messages. Severins, J.D. Award date: Link to publication MASTER Anomaly detection on event logs an unsupervised algorithm on ixr-messages Severins, J.D. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's),

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate

More information

Detection of Unauthorized Electricity Consumption using Machine Learning

Detection of Unauthorized Electricity Consumption using Machine Learning Detection of Unauthorized Electricity Consumption using Machine Learning Bo Tang, Ph.D. Department of Electrical and Computer Engineering Mississippi State University Outline Advanced Metering Infrastructure

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Anomaly Detection via Online Over-Sampling Principal Component Analysis

Anomaly Detection via Online Over-Sampling Principal Component Analysis Anomaly Detection via Online Over-Sampling Principal Component Analysis Yi-Ren Yeh 1, Yuh-Jye Lee 2 and Yu-Chiang Frank Wang 1 1 Research Center for Information Technology Innovation, Academia Sinica 2

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017 Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017 www.u4learn.it Ing. Giuseppe La Tona Sommario Machine Learning definition Machine Learning Problems Artificial Neural

More information

ENTROPY FILTER FOR ANOMALY DETECTION WITH EDDY CURRENT REMOTE FIELD SENSORS

ENTROPY FILTER FOR ANOMALY DETECTION WITH EDDY CURRENT REMOTE FIELD SENSORS ENTROPY FILTER FOR ANOMALY DETECTION WITH EDDY CURRENT REMOTE FIELD SENSORS By Farid Sheikhi May 2014 A Thesis submitted to the School of Graduate Studies and Research in partial fulfillment of the requirements

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Classification and Pattern Recognition

Classification and Pattern Recognition Classification and Pattern Recognition Léon Bottou NEC Labs America COS 424 2/23/2010 The machine learning mix and match Goals Representation Capacity Control Operational Considerations Computational Considerations

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data. Technical Report

A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data. Technical Report A Comparative Evaluation of Anomaly Detection Techniques for Sequence Data Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat Geometric View of Machine Learning Nearest Neighbor Classification Slides adapted from Prof. Carpuat What we know so far Decision Trees What is a decision tree, and how to induce it from data Fundamental

More information

Probability and Statistics

Probability and Statistics CHAPTER 4: IT IS ALL ABOUT DATA 4b - 0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT

More information

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Experiment presentation for CS3710:Visual Recognition Presenter: Zitao Liu University of Pittsburgh ztliu@cs.pitt.edu

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Rare Event Discovery And Event Change Point In Biological Data Stream

Rare Event Discovery And Event Change Point In Biological Data Stream Rare Event Discovery And Event Change Point In Biological Data Stream T. Jagadeeswari 1 M.Tech(CSE) MISTE, B. Mahalakshmi 2 M.Tech(CSE)MISTE, N. Anusha 3 M.Tech(CSE) Department of Computer Science and

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Introduction to machine learning

Introduction to machine learning 1/59 Introduction to machine learning Victor Kitov v.v.kitov@yandex.ru 1/59 Course information Instructor - Victor Vladimirovich Kitov Tasks of the course Structure: Tools lectures, seminars assignements:

More information

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK

Anomaly Detection in Logged Sensor Data. Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK Anomaly Detection in Logged Sensor Data Master s thesis in Complex Adaptive Systems JOHAN FLORBÄCK Department of Applied Mechanics CHALMERS UNIVERSITY OF TECHNOLOGY Göteborg, Sweden 2015 MASTER S THESIS

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Probabilistic Machine Learning. Industrial AI Lab.

Probabilistic Machine Learning. Industrial AI Lab. Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear

More information

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Skylines. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland Yufei Tao ITEE University of Queensland Today we will discuss problems closely related to the topic of multi-criteria optimization, where one aims to identify objects that strike a good balance often optimal

More information

BNG 495 Capstone Design. Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics BNG 495 Capstone Design Descriptive Statistics Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential statistical methods, with a focus

More information

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams. Ruoyi Jiang

A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams. Ruoyi Jiang A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams By Ruoyi Jiang Submitted to the graduate degree program in Department of Electrical Engineering and Computer Science

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010 Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X

More information

Data Mining algorithms

Data Mining algorithms Data Mining algorithms 2017-2018 spring 02.07-09.2018 Overview Classification vs. Regression Evaluation I Basics Bálint Daróczy daroczyb@ilab.sztaki.hu Basic reachability: MTA SZTAKI, Lágymányosi str.

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Anomaly Detection via Over-sampling Principal Component Analysis

Anomaly Detection via Over-sampling Principal Component Analysis Anomaly Detection via Over-sampling Principal Component Analysis Yi-Ren Yeh, Zheng-Yi Lee, and Yuh-Jye Lee Abstract Outlier detection is an important issue in data mining and has been studied in different

More information

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016 CPSC 340: Machine Learning and Data Mining Linear Least Squares Fall 2016 Assignment 2 is due Friday: Admin You should already be started! 1 late day to hand it in on Wednesday, 2 for Friday, 3 for next

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Modeling Complex Temporal Composition of Actionlets for Activity Prediction

Modeling Complex Temporal Composition of Actionlets for Activity Prediction Modeling Complex Temporal Composition of Actionlets for Activity Prediction ECCV 2012 Activity Recognition Reading Group Framework of activity prediction What is an Actionlet To segment a long sequence

More information

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 4 Spatial Point Patterns Definition Set of point locations with recorded events" within study

More information

MULTI-LEVEL RELATIONSHIP OUTLIER DETECTION

MULTI-LEVEL RELATIONSHIP OUTLIER DETECTION MULTI-LEVEL RELATIONSHIP OUTLIER DETECTION by Qiang Jiang B.Eng., East China Normal University, 2010 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in

More information

Global Scene Representations. Tilke Judd

Global Scene Representations. Tilke Judd Global Scene Representations Tilke Judd Papers Oliva and Torralba [2001] Fei Fei and Perona [2005] Labzebnik, Schmid and Ponce [2006] Commonalities Goal: Recognize natural scene categories Extract features

More information

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel

More information

A Bayesian Method for Guessing the Extreme Values in a Data Set

A Bayesian Method for Guessing the Extreme Values in a Data Set A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu University of Florida May, 2008 Mingxi Wu (University of Florida) May, 2008 1 / 74 Outline Problem Definition Example Applications

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

An Introduction to Machine Learning

An Introduction to Machine Learning An Introduction to Machine Learning L2: Instance Based Estimation Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune, January

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Data & Data Preprocessing & Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Chapter

More information

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University Batch Mode Sparse Active Learning Lixin Shi, Yuhang Zhao Tsinghua University Our work Propose an unified framework of batch mode active learning Instantiate the framework using classifiers based on sparse

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Graph-Based Anomaly Detection with Soft Harmonic Functions

Graph-Based Anomaly Detection with Soft Harmonic Functions Graph-Based Anomaly Detection with Soft Harmonic Functions Michal Valko Advisor: Milos Hauskrecht Computer Science Department, University of Pittsburgh, Computer Science Day 2011, March 18 th, 2011. Anomaly

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline What is Surprise Detection? Example Application: The LSST

More information

Multivariate Analysis of Crime Data using Spatial Outlier Detection Algorithm

Multivariate Analysis of Crime Data using Spatial Outlier Detection Algorithm J. Stat. Appl. Pro. 5, No. 3, 433-438 (2016) 433 Journal of Statistics Applications & Probability An International Journal http://dx.doi.org/10.18576/jsap/050307 Multivariate Analysis of Crime Data using

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Kernel expansions with unlabeled examples

Kernel expansions with unlabeled examples Kernel expansions with unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA tommi@ai.mit.edu Abstract Modern classification applications

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information