Klasifikace pomocí hloubky dat nové nápady

Size: px
Start display at page:

Download "Klasifikace pomocí hloubky dat nové nápady"

Transcription

1 Klasifikace pomocí hloubky dat nové nápady Ondřej Vencálek Univerzita Palackého v Olomouci ROBUST Rybník, 26. ledna 2018

2

3 Z. Fabián: Vzpomínka na Compstat 2010 aneb večeře na Pařížské radnici, Informační Bulletin ČStS, 4/2017 Nápad pořádat konferenci v Paříži se dustojně řadí k nápadu pořádat mistrovství světa v Kataru nebo letní olympiádu v Letňanech....

4 It is deep inside = it has high depth

5 It is deep inside = it has high depth Trees DEEP inside the forest

6 It is deep inside = it has high depth Trees DEEP inside the forest Trees far from the center

7 It is deep inside = it has high depth

8 Outliers in R 1 outlier

9 Outliers in R 2 any idea?

10 1st idea Mahalanobist distance

11 1st idea Mahalanobist distance non-elliptical case

12 1st idea Mahalanobist distance non-elliptical case

13 Halfspace depth in R 1 D(x) = min {P(X x), P(X x)} Example: univariate normal distribution N(µ, σ 2 ) density(x) outliers the deepest point (median) decreasing depth decreasing depth outliers

14 Halfspace depth in R d projection pursuit approach D(x) = inf u: u =1 D(uT x). x 2? x 1

15 The halfspace depth of a point x R d with respect to a probability measure P on R d is defined as the minimum probability mass carried by any closed halfspace containing x, that is D(x; P) = inf {P(H) : H a closed halfspace, x H} The depth provides so called central-outward ordering of data The deepest point extends the notion of median into a higher dimensions.

16 Problem of classification The Bayes classifier class(x) = arg max f i (x)π i. i produces smallest possible number of misclassified points (total over all groups)

17 When Bayes misclassifies the centre of the distribution... P 1 = N(0, 1), P 2 = N(1, 1) π 1 = 0.7, π 2 = 0.3 posterior x

18 Hey, Bayes, be fair to x 2! lognormal distribution from N(0, 1) 0.6 f(x 1 ) density f(x 2 ) x x x quantile 0.05 = x 1, f (x 1 ) = 0.53 quantile 0.95 = x 2, f (x 2 ) = 0.02

19 Basics in decision theory Distributions P i defined on R d, with densities f i and prior probabilities π i, i = 1,..., K. A classifier divides the space R d into K disjoint parts A i, i = 1,..., K, K i=1 A i = R d such that any x R d is assigned to P i iff x A i. Cost function c ij (x) = { c i (x) if j i, 0 if j = i. Total cost min K i=1 j i A j c i (x)f i (x)π i dx. Optimal classifier class(x) = arg max c i (x)f i (x)π i. i

20 Bayes classifier and its 2 new competitors class(x) = arg max c i (x)f i (x)π i. i 1. Bayes classifier: c i (x) = 1 class B (x) = arg max f i (x)π i i 2. Depth-weighted classifier: c i (x) = D(x; P i ) class D (x) = arg max D(x; P i )f i (x)π i i 3. Rank-weighted classifier: c i (x) = F i (D(x; P i )), where F i is CDF of D(X ; P i ), X P i class R (x) = arg max F i (D(x; P i ))f i (x)π i i

21 Toy example P 1 = Unif [0, 100], π 1 = 0.5, P 2 = Unif [50, 250], π 2 = 0.5. Classification on the overlapping part of supports: Bayes classifier: Depth-weighted classifier: class B (x) = 1 for x [50, 100] class D (x) = { 1 for x [50, 90), 2 for x (90, 100].

22

23 The first question (or two) To what extent do the new (depth-weighted) classifiers differ from the Bayes classifier? How large might the corresponding difference in the average misclassification rate be?

24 Difference in the case of elliptical symmetry Let us assume: (P1): f i (x) = k i g(m i (x)), where g is a strictly decreasing function, k i > 0 are constants, and M i (x) = ( (x α i ) B 1 i (x α i ) ) 1 2 with B i positive definite, α i R d, denotes the generalized distance of the point x from the center of the distribution P i. (P2) D(x) is an affine invariant depth. Then, given (P1), D i (x) = D(x; P i ) is a fixed decreasing function of generalized distance, that can be expressed as D i (x) = h(m i (x)), where h is a strictly decreasing function. Denote G(x) = k1g(m1(x)) k 2g(M 2(x)) H(x) = h(m1(x)) h(m 2(x)) π = π2 π 1 Then as likelihood ratio, as depth ratio, as inverse prior ratio The Bayes classifier assigns x to P 1 if G(x) > π, The depth-weighted classifier assigns x to P 1 if H(x)G(x) > π.

25 Difference in the case of elliptical symmetry x: M1 (x)>m2 (x) 0 H(x)G(x) G(x) π k_1 k2 x: M1 (x)<m2 (x) 0 k_1 k2 G(x) H(x)G(x) π The classifiers differ when π is between G (x) and H(x)G (x). For the fixed π, the region where the classifiers differ is RD(π) = x Rd : H(x)G (x) < π < G (x) or G (x) < π < H(x)G (x). Let (P1) and (P2) hold for P1 and P2. Then P(classB (X ) 6= classd (X )) = 0 π1 k1 = π2 k2.

26 Difference in the case of elliptical symmetry example P 1 = N( 1, 1), P 2 = N(1, 1) probability of different classification log(π 1 π 2 )

27 Next question To what extent does the performance of the new classifiers depend on the choice of depth function? Which depth function results in the smallest (biggest) differences from the Bayes classifier?

28 Difference between the depth-based classifiers using halfspace and projection depths P 1 = Unif [0, 100], π 1 = 0.5, P 2 = Unif [50, 250], π 2 = 0.5. D(x)f(x)π for the halfspace depth D(x)f(x)π for the projection depth D(x)f(x)π D(x)f(x)π x x

29 Rank-weighted classifier Depth-weighted classifier: c i (x) = D(x; P i ) class D (x) = arg max D(x; P i )f i (x)π i i Rank-weighted classifier: c i (x) = F i (D(x; P i )), where F i is CDF of D(X ; P i ), X P i class R (x) = arg max F i (D(x; P i ))f i (x)π i i Let (P1) and (P2) hold for P 1 and P 2 and any two depth functions D( ) and D ( ). Then class R (x) is the same for both D( ) and D ( ) with probability one.

30 Robustness of the newly proposed classifiers density 0 z 2z 3z z0 3z 3z+z0 4z x (1 α)π 1 for P 1, απ 1 for the contamination of P 1 π 2 for the non-contaminated P 2. Assume z 0 = z, π 2 < απ 1.

31 Robustness of the newly proposed classifiers density 0 z 2z 3z z0 3z 3z+z0 4z x (1 α)π 1 for P 1, απ 1 for the contamination of P 1 π 2 for the non-contaminated P 2. Assume z 0 = z, π 2 < απ 1. f 2 (x)π 2 < f 1 (x)π 1 for all x (2z, 4z). Misclass. rate for group 2 is 1. D 2 (x)f 2 (x)π 2 > D 1 (x)f 1 (x)π 1 for x > 2z + 2π 2 z Misclass. rate for group 2 is π 2

32 Simulation study settings Group 1 Group 2 Ex. Distribution Parameters Distribution Parameters 1 Normal 0, Σ 0 Normal 1, Σ 0 2 Normal 0, Σ 0 Normal 1, 4Σ 0 3 Cauchy 0, Σ 0 Cauchy 1, Σ 0 4 Cauchy 0, Σ 0 Cauchy 1, 4Σ 0 5 Bivar. exponen. 1, 1 Shifted bivar. expon. (+1) 1, 1 6 Bivar. exponen. 1, 1/2 Shifted bivar. expon. (+1) 1/2, 1 7 Normal 0, I Bivar. exponential 1, 1 8 Skewed normal ( 1 1 Where Σ 0 = 1 4 ). ( 12 ), ( ), ( 2 5 ) Skewed normal ( 0 1 ), ( ) ( , 15 )

33 Percentage of points classified differently than by the Bayes classifier 30 Example 1 Example 2 Example 3 Example 4 Example 5 Example 6 Example 7 Example 8 Different classification, % pi1: 0.1 pi1: 0.3 pi1: 0.5 pi1: 0.7 pi1: 0.9 classifier DW RW 0 halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial depth

34 Increase of AMR versus percentage of points classified differently than by the Bayes classifier halfspace depth projection depth spatial depth difference in AMR (%) difference in AMR (%) difference in AMR (%) Percentage of points classified differently Percentage of points classified differently Percentage of points classified differently depth optimal class. rank optimal class.

35 Conclusion We suggest to weight misclassification cost according to the centrality of a misclassified point measured either by its depth w.r.t. the distribution from which it comes (depth-weighted classifier) or by its rank w.r.t. the distribution from which it comes (rank-weighted classifier). In particular cases, the new classifiers does not differ from the Bayes classifier. Simulation study showed that increase in AMR is much smaller than percentage of points classified differently. Both classifiers depend on the depth function which they are using. In particular cases rank-weighted classifier does not depend on the used depth function. Presence of the depth term in the depth-based classifiers may substantially increase robustness of the procedure.

36 Ondrej Vencalek Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacky University in Olomouc, 17. listopadu 12, Olomouc, Czech Republic

37 Happy end

Statistical Depth Function

Statistical Depth Function Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

Robust estimation of principal components from depth-based multivariate rank covariance matrix

Robust estimation of principal components from depth-based multivariate rank covariance matrix Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary

More information

Frequency Analysis & Probability Plots

Frequency Analysis & Probability Plots Note Packet #14 Frequency Analysis & Probability Plots CEE 3710 October 0, 017 Frequency Analysis Process by which engineers formulate magnitude of design events (i.e. 100 year flood) or assess risk associated

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Scalable robust hypothesis tests using graphical models

Scalable robust hypothesis tests using graphical models Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

2 (Statistics) Random variables

2 (Statistics) Random variables 2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Multivariate quantiles and conditional depth

Multivariate quantiles and conditional depth M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN 1686 On Some Generalised Transmuted Distributions Kishore K. Das Luku Deka Barman Department of Statistics Gauhati University Abstract In this paper a generalized form of the transmuted distributions has

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

1 Probability and Random Variables

1 Probability and Random Variables 1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

WEIGHTED HALFSPACE DEPTH

WEIGHTED HALFSPACE DEPTH K Y BERNETIKA VOLUM E 46 2010), NUMBER 1, P AGES 125 148 WEIGHTED HALFSPACE DEPTH Daniel Hlubinka, Lukáš Kotík and Ondřej Vencálek Generalised halfspace depth function is proposed. Basic properties of

More information

Lecture 8 October Bayes Estimators and Average Risk Optimality

Lecture 8 October Bayes Estimators and Average Risk Optimality STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

Prior Choice, Summarizing the Posterior

Prior Choice, Summarizing the Posterior Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Exploratory Data Analysis. CERENA Instituto Superior Técnico

Exploratory Data Analysis. CERENA Instituto Superior Técnico Exploratory Data Analysis CERENA Instituto Superior Técnico Some questions for which we use Exploratory Data Analysis: 1. What is a typical value? 2. What is the uncertainty around a typical value? 3.

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Probability: Handout

Probability: Handout Probability: Handout Klaus Pötzelberger Vienna University of Economics and Business Institute for Statistics and Mathematics E-mail: Klaus.Poetzelberger@wu.ac.at Contents 1 Axioms of Probability 3 1.1

More information

Chapter 6: Functions of Random Variables

Chapter 6: Functions of Random Variables Chapter 6: Functions of Random Variables We are often interested in a function of one or several random variables, U(Y 1,..., Y n ). We will study three methods for determining the distribution of a function

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Topic 4: Continuous random variables

Topic 4: Continuous random variables Topic 4: Continuous random variables Course 003, 2018 Page 0 Continuous random variables Definition (Continuous random variable): An r.v. X has a continuous distribution if there exists a non-negative

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information

CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS

CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS APPROVED BY SUPERVISORY COMMITTEE: Robert Serfling, Chair Larry Ammann John Van Ness Michael Baron Copyright 1998 Yijun Zuo All

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Math 181B Homework 1 Solution

Math 181B Homework 1 Solution Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Counting principles, including permutations and combinations.

Counting principles, including permutations and combinations. 1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

Jensen s inequality for multivariate medians

Jensen s inequality for multivariate medians Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,

More information

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth

Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Chong Ma 1 David B. Hitchcock 2 1 PhD Candidate University of South Carolina 2 Associate Professor

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes

More information

5.6 The Normal Distributions

5.6 The Normal Distributions STAT 41 Lecture Notes 13 5.6 The Normal Distributions Definition 5.6.1. A (continuous) random variable X has a normal distribution with mean µ R and variance < R if the p.d.f. of X is f(x µ, ) ( π ) 1/

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

Chapter 4. Chapter 4 sections

Chapter 4. Chapter 4 sections Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:

More information

Generalized quantiles as risk measures

Generalized quantiles as risk measures Generalized quantiles as risk measures Bellini, Klar, Muller, Rosazza Gianin December 1, 2014 Vorisek Jan Introduction Quantiles q α of a random variable X can be defined as the minimizers of a piecewise

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert

Comments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be

More information

MIT Spring 2015

MIT Spring 2015 Assessing Goodness Of Fit MIT 8.443 Dr. Kempthorne Spring 205 Outline 2 Poisson Distribution Counts of events that occur at constant rate Counts in disjoint intervals/regions are independent If intervals/regions

More information

Continuous Distributions

Continuous Distributions Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study

More information

Nonlinear Signal Processing ELEG 833

Nonlinear Signal Processing ELEG 833 Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM

More information

Set Theory Digression

Set Theory Digression 1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of

More information

Bell-shaped curves, variance

Bell-shaped curves, variance November 7, 2017 Pop-in lunch on Wednesday Pop-in lunch tomorrow, November 8, at high noon. Please join our group at the Faculty Club for lunch. Means If X is a random variable with PDF equal to f (x),

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Topic 4: Continuous random variables

Topic 4: Continuous random variables Topic 4: Continuous random variables Course 3, 216 Page Continuous random variables Definition (Continuous random variable): An r.v. X has a continuous distribution if there exists a non-negative function

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Sparse Kernel Machines - SVM

Sparse Kernel Machines - SVM Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

Functions of Random Variables

Functions of Random Variables Functions of Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Functions of Random Variables As we ve seen before, if X N(µ, σ 2 ), then Y = ax + b is also normally distributed.

More information

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4. UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,

More information

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given. HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard

More information

LOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES

LOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES ? (») /»» 9 F ( ) / ) /»F»»»»»# F??»»» Q ( ( »»» < 3»» /» > > } > Q ( Q > Z F 5

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information