Klasifikace pomocí hloubky dat nové nápady
|
|
- Roderick Austin
- 5 years ago
- Views:
Transcription
1 Klasifikace pomocí hloubky dat nové nápady Ondřej Vencálek Univerzita Palackého v Olomouci ROBUST Rybník, 26. ledna 2018
2
3 Z. Fabián: Vzpomínka na Compstat 2010 aneb večeře na Pařížské radnici, Informační Bulletin ČStS, 4/2017 Nápad pořádat konferenci v Paříži se dustojně řadí k nápadu pořádat mistrovství světa v Kataru nebo letní olympiádu v Letňanech....
4 It is deep inside = it has high depth
5 It is deep inside = it has high depth Trees DEEP inside the forest
6 It is deep inside = it has high depth Trees DEEP inside the forest Trees far from the center
7 It is deep inside = it has high depth
8 Outliers in R 1 outlier
9 Outliers in R 2 any idea?
10 1st idea Mahalanobist distance
11 1st idea Mahalanobist distance non-elliptical case
12 1st idea Mahalanobist distance non-elliptical case
13 Halfspace depth in R 1 D(x) = min {P(X x), P(X x)} Example: univariate normal distribution N(µ, σ 2 ) density(x) outliers the deepest point (median) decreasing depth decreasing depth outliers
14 Halfspace depth in R d projection pursuit approach D(x) = inf u: u =1 D(uT x). x 2? x 1
15 The halfspace depth of a point x R d with respect to a probability measure P on R d is defined as the minimum probability mass carried by any closed halfspace containing x, that is D(x; P) = inf {P(H) : H a closed halfspace, x H} The depth provides so called central-outward ordering of data The deepest point extends the notion of median into a higher dimensions.
16 Problem of classification The Bayes classifier class(x) = arg max f i (x)π i. i produces smallest possible number of misclassified points (total over all groups)
17 When Bayes misclassifies the centre of the distribution... P 1 = N(0, 1), P 2 = N(1, 1) π 1 = 0.7, π 2 = 0.3 posterior x
18 Hey, Bayes, be fair to x 2! lognormal distribution from N(0, 1) 0.6 f(x 1 ) density f(x 2 ) x x x quantile 0.05 = x 1, f (x 1 ) = 0.53 quantile 0.95 = x 2, f (x 2 ) = 0.02
19 Basics in decision theory Distributions P i defined on R d, with densities f i and prior probabilities π i, i = 1,..., K. A classifier divides the space R d into K disjoint parts A i, i = 1,..., K, K i=1 A i = R d such that any x R d is assigned to P i iff x A i. Cost function c ij (x) = { c i (x) if j i, 0 if j = i. Total cost min K i=1 j i A j c i (x)f i (x)π i dx. Optimal classifier class(x) = arg max c i (x)f i (x)π i. i
20 Bayes classifier and its 2 new competitors class(x) = arg max c i (x)f i (x)π i. i 1. Bayes classifier: c i (x) = 1 class B (x) = arg max f i (x)π i i 2. Depth-weighted classifier: c i (x) = D(x; P i ) class D (x) = arg max D(x; P i )f i (x)π i i 3. Rank-weighted classifier: c i (x) = F i (D(x; P i )), where F i is CDF of D(X ; P i ), X P i class R (x) = arg max F i (D(x; P i ))f i (x)π i i
21 Toy example P 1 = Unif [0, 100], π 1 = 0.5, P 2 = Unif [50, 250], π 2 = 0.5. Classification on the overlapping part of supports: Bayes classifier: Depth-weighted classifier: class B (x) = 1 for x [50, 100] class D (x) = { 1 for x [50, 90), 2 for x (90, 100].
22
23 The first question (or two) To what extent do the new (depth-weighted) classifiers differ from the Bayes classifier? How large might the corresponding difference in the average misclassification rate be?
24 Difference in the case of elliptical symmetry Let us assume: (P1): f i (x) = k i g(m i (x)), where g is a strictly decreasing function, k i > 0 are constants, and M i (x) = ( (x α i ) B 1 i (x α i ) ) 1 2 with B i positive definite, α i R d, denotes the generalized distance of the point x from the center of the distribution P i. (P2) D(x) is an affine invariant depth. Then, given (P1), D i (x) = D(x; P i ) is a fixed decreasing function of generalized distance, that can be expressed as D i (x) = h(m i (x)), where h is a strictly decreasing function. Denote G(x) = k1g(m1(x)) k 2g(M 2(x)) H(x) = h(m1(x)) h(m 2(x)) π = π2 π 1 Then as likelihood ratio, as depth ratio, as inverse prior ratio The Bayes classifier assigns x to P 1 if G(x) > π, The depth-weighted classifier assigns x to P 1 if H(x)G(x) > π.
25 Difference in the case of elliptical symmetry x: M1 (x)>m2 (x) 0 H(x)G(x) G(x) π k_1 k2 x: M1 (x)<m2 (x) 0 k_1 k2 G(x) H(x)G(x) π The classifiers differ when π is between G (x) and H(x)G (x). For the fixed π, the region where the classifiers differ is RD(π) = x Rd : H(x)G (x) < π < G (x) or G (x) < π < H(x)G (x). Let (P1) and (P2) hold for P1 and P2. Then P(classB (X ) 6= classd (X )) = 0 π1 k1 = π2 k2.
26 Difference in the case of elliptical symmetry example P 1 = N( 1, 1), P 2 = N(1, 1) probability of different classification log(π 1 π 2 )
27 Next question To what extent does the performance of the new classifiers depend on the choice of depth function? Which depth function results in the smallest (biggest) differences from the Bayes classifier?
28 Difference between the depth-based classifiers using halfspace and projection depths P 1 = Unif [0, 100], π 1 = 0.5, P 2 = Unif [50, 250], π 2 = 0.5. D(x)f(x)π for the halfspace depth D(x)f(x)π for the projection depth D(x)f(x)π D(x)f(x)π x x
29 Rank-weighted classifier Depth-weighted classifier: c i (x) = D(x; P i ) class D (x) = arg max D(x; P i )f i (x)π i i Rank-weighted classifier: c i (x) = F i (D(x; P i )), where F i is CDF of D(X ; P i ), X P i class R (x) = arg max F i (D(x; P i ))f i (x)π i i Let (P1) and (P2) hold for P 1 and P 2 and any two depth functions D( ) and D ( ). Then class R (x) is the same for both D( ) and D ( ) with probability one.
30 Robustness of the newly proposed classifiers density 0 z 2z 3z z0 3z 3z+z0 4z x (1 α)π 1 for P 1, απ 1 for the contamination of P 1 π 2 for the non-contaminated P 2. Assume z 0 = z, π 2 < απ 1.
31 Robustness of the newly proposed classifiers density 0 z 2z 3z z0 3z 3z+z0 4z x (1 α)π 1 for P 1, απ 1 for the contamination of P 1 π 2 for the non-contaminated P 2. Assume z 0 = z, π 2 < απ 1. f 2 (x)π 2 < f 1 (x)π 1 for all x (2z, 4z). Misclass. rate for group 2 is 1. D 2 (x)f 2 (x)π 2 > D 1 (x)f 1 (x)π 1 for x > 2z + 2π 2 z Misclass. rate for group 2 is π 2
32 Simulation study settings Group 1 Group 2 Ex. Distribution Parameters Distribution Parameters 1 Normal 0, Σ 0 Normal 1, Σ 0 2 Normal 0, Σ 0 Normal 1, 4Σ 0 3 Cauchy 0, Σ 0 Cauchy 1, Σ 0 4 Cauchy 0, Σ 0 Cauchy 1, 4Σ 0 5 Bivar. exponen. 1, 1 Shifted bivar. expon. (+1) 1, 1 6 Bivar. exponen. 1, 1/2 Shifted bivar. expon. (+1) 1/2, 1 7 Normal 0, I Bivar. exponential 1, 1 8 Skewed normal ( 1 1 Where Σ 0 = 1 4 ). ( 12 ), ( ), ( 2 5 ) Skewed normal ( 0 1 ), ( ) ( , 15 )
33 Percentage of points classified differently than by the Bayes classifier 30 Example 1 Example 2 Example 3 Example 4 Example 5 Example 6 Example 7 Example 8 Different classification, % pi1: 0.1 pi1: 0.3 pi1: 0.5 pi1: 0.7 pi1: 0.9 classifier DW RW 0 halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial halfspaceprojection spatial depth
34 Increase of AMR versus percentage of points classified differently than by the Bayes classifier halfspace depth projection depth spatial depth difference in AMR (%) difference in AMR (%) difference in AMR (%) Percentage of points classified differently Percentage of points classified differently Percentage of points classified differently depth optimal class. rank optimal class.
35 Conclusion We suggest to weight misclassification cost according to the centrality of a misclassified point measured either by its depth w.r.t. the distribution from which it comes (depth-weighted classifier) or by its rank w.r.t. the distribution from which it comes (rank-weighted classifier). In particular cases, the new classifiers does not differ from the Bayes classifier. Simulation study showed that increase in AMR is much smaller than percentage of points classified differently. Both classifiers depend on the depth function which they are using. In particular cases rank-weighted classifier does not depend on the used depth function. Presence of the depth term in the depth-based classifiers may substantially increase robustness of the procedure.
36 Ondrej Vencalek Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacky University in Olomouc, 17. listopadu 12, Olomouc, Czech Republic
37 Happy end
Statistical Depth Function
Machine Learning Journal Club, Gatsby Unit June 27, 2016 Outline L-statistics, order statistics, ranking. Instead of moments: median, dispersion, scale, skewness,... New visualization tools: depth contours,
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationFast and Robust Classifiers Adjusted for Skewness
Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be
More informationRobust estimation of principal components from depth-based multivariate rank covariance matrix
Robust estimation of principal components from depth-based multivariate rank covariance matrix Subho Majumdar Snigdhansu Chatterjee University of Minnesota, School of Statistics Table of contents Summary
More informationFrequency Analysis & Probability Plots
Note Packet #14 Frequency Analysis & Probability Plots CEE 3710 October 0, 017 Frequency Analysis Process by which engineers formulate magnitude of design events (i.e. 100 year flood) or assess risk associated
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationWhat does Bayes theorem give us? Lets revisit the ball in the box example.
ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered
More information2 (Statistics) Random variables
2 (Statistics) Random variables References: DeGroot and Schervish, chapters 3, 4 and 5; Stirzaker, chapters 4, 5 and 6 We will now study the main tools use for modeling experiments with unknown outcomes
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMultivariate quantiles and conditional depth
M. Hallin a,b, Z. Lu c, D. Paindaveine a, and M. Šiman d a Université libre de Bruxelles, Belgium b Princenton University, USA c University of Adelaide, Australia d Institute of Information Theory and
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationWeek 2. Review of Probability, Random Variables and Univariate Distributions
Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationInternational Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN
1686 On Some Generalised Transmuted Distributions Kishore K. Das Luku Deka Barman Department of Statistics Gauhati University Abstract In this paper a generalized form of the transmuted distributions has
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More information1 Probability and Random Variables
1 Probability and Random Variables The models that you have seen thus far are deterministic models. For any time t, there is a unique solution X(t). On the other hand, stochastic models will result in
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationWEIGHTED HALFSPACE DEPTH
K Y BERNETIKA VOLUM E 46 2010), NUMBER 1, P AGES 125 148 WEIGHTED HALFSPACE DEPTH Daniel Hlubinka, Lukáš Kotík and Ondřej Vencálek Generalised halfspace depth function is proposed. Basic properties of
More informationLecture 8 October Bayes Estimators and Average Risk Optimality
STATS 300A: Theory of Statistics Fall 205 Lecture 8 October 5 Lecturer: Lester Mackey Scribe: Hongseok Namkoong, Phan Minh Nguyen Warning: These notes may contain factual and/or typographic errors. 8.
More informationrobustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression
Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More informationPrior Choice, Summarizing the Posterior
Prior Choice, Summarizing the Posterior Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Informative Priors Binomial Model: y π Bin(n, π) π is the success probability. Need prior p(π) Bayes
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationExploratory Data Analysis. CERENA Instituto Superior Técnico
Exploratory Data Analysis CERENA Instituto Superior Técnico Some questions for which we use Exploratory Data Analysis: 1. What is a typical value? 2. What is the uncertainty around a typical value? 3.
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationProbability: Handout
Probability: Handout Klaus Pötzelberger Vienna University of Economics and Business Institute for Statistics and Mathematics E-mail: Klaus.Poetzelberger@wu.ac.at Contents 1 Axioms of Probability 3 1.1
More informationChapter 6: Functions of Random Variables
Chapter 6: Functions of Random Variables We are often interested in a function of one or several random variables, U(Y 1,..., Y n ). We will study three methods for determining the distribution of a function
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationTopic 4: Continuous random variables
Topic 4: Continuous random variables Course 003, 2018 Page 0 Continuous random variables Definition (Continuous random variable): An r.v. X has a continuous distribution if there exists a non-negative
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationMATH4427 Notebook 4 Fall Semester 2017/2018
MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationCONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS
CONTRIBUTIONS TO THE THEORY AND APPLICATIONS OF STATISTICAL DEPTH FUNCTIONS APPROVED BY SUPERVISORY COMMITTEE: Robert Serfling, Chair Larry Ammann John Van Ness Michael Baron Copyright 1998 Yijun Zuo All
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationMath 181B Homework 1 Solution
Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationCounting principles, including permutations and combinations.
1 Counting principles, including permutations and combinations. The binomial theorem: expansion of a + b n, n ε N. THE PRODUCT RULE If there are m different ways of performing an operation and for each
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional
More informationJensen s inequality for multivariate medians
Jensen s inequality for multivariate medians Milan Merkle University of Belgrade, Serbia emerkle@etf.rs Given a probability measure µ on Borel sigma-field of R d, and a function f : R d R, the main issue
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More information1 Introduction. P (n = 1 red ball drawn) =
Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,
More informationSupervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth
Supervised Classification for Functional Data Using False Discovery Rate and Multivariate Functional Depth Chong Ma 1 David B. Hitchcock 2 1 PhD Candidate University of South Carolina 2 Associate Professor
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More information5.6 The Normal Distributions
STAT 41 Lecture Notes 13 5.6 The Normal Distributions Definition 5.6.1. A (continuous) random variable X has a normal distribution with mean µ R and variance < R if the p.d.f. of X is f(x µ, ) ( π ) 1/
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationChapter 4. Chapter 4 sections
Chapter 4 sections 4.1 Expectation 4.2 Properties of Expectations 4.3 Variance 4.4 Moments 4.5 The Mean and the Median 4.6 Covariance and Correlation 4.7 Conditional Expectation SKIP: 4.8 Utility Expectation
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationGeneralized quantiles as risk measures
Generalized quantiles as risk measures Bellini, Klar, Muller, Rosazza Gianin December 1, 2014 Vorisek Jan Introduction Quantiles q α of a random variable X can be defined as the minimizers of a piecewise
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationComments. x > w = w > x. Clarification: this course is about getting you to be able to think as a machine learning expert
Logistic regression Comments Mini-review and feedback These are equivalent: x > w = w > x Clarification: this course is about getting you to be able to think as a machine learning expert There has to be
More informationMIT Spring 2015
Assessing Goodness Of Fit MIT 8.443 Dr. Kempthorne Spring 205 Outline 2 Poisson Distribution Counts of events that occur at constant rate Counts in disjoint intervals/regions are independent If intervals/regions
More informationContinuous Distributions
Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study
More informationNonlinear Signal Processing ELEG 833
Nonlinear Signal Processing ELEG 833 Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware arce@ee.udel.edu May 5, 2005 8 MYRIAD SMOOTHERS 8 Myriad Smoothers 8.1 FLOM
More informationSet Theory Digression
1 Introduction to Probability 1.1 Basic Rules of Probability Set Theory Digression A set is defined as any collection of objects, which are called points or elements. The biggest possible collection of
More informationBell-shaped curves, variance
November 7, 2017 Pop-in lunch on Wednesday Pop-in lunch tomorrow, November 8, at high noon. Please join our group at the Faculty Club for lunch. Means If X is a random variable with PDF equal to f (x),
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationTopic 4: Continuous random variables
Topic 4: Continuous random variables Course 3, 216 Page Continuous random variables Definition (Continuous random variable): An r.v. X has a continuous distribution if there exists a non-negative function
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationSparse Kernel Machines - SVM
Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More informationFunctions of Random Variables
Functions of Random Variables Statistics 110 Summer 2006 Copyright c 2006 by Mark E. Irwin Functions of Random Variables As we ve seen before, if X N(µ, σ 2 ), then Y = ax + b is also normally distributed.
More informationContinuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.
UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationLOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES
? (») /»» 9 F ( ) / ) /»F»»»»»# F??»»» Q ( ( »»» < 3»» /» > > } > Q ( Q > Z F 5
More informationLECTURE NOTES 57. Lecture 9
LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationJournal of Biostatistics and Epidemiology
Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More information