Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Size: px
Start display at page:

Download "Smart Home Health Analytics Information Systems University of Maryland Baltimore County"

Transcription

1 Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1

2 IEEE Expert, October

3 Given sample S from all possible examples D Learner L learns hypothesis h based on S Sample error: error S (h) True error: error D (h) Example Hypothesis h misclassifies 12 of 40 examples in S error S (h) = 0.3 What is error D (h)? 3

4 Learner A learns hypothesis h A on sample S Learner B learns hypothesis h B on sample S Observe: error S (h A ) < error S (h B ) Is error D (h A ) < error D (h B )? Is learner A better than learner B? 4

5 How can we estimate the true error of a classifier? How can we determine if one learner is better than another? Using sample error is too optimistic Using error on a separate test set is better, but might still be misleading Repeating above for multiple iterations, each with different training/testing sets, yields better estimate of true error

6 David Wolpert, 1995 For any learning algorithm there are datasets for which it does well, and datasets for which is does poorly Performance estimates are based on specific datasets, not an estimate of the learner on all datasets There is no one best learning algorithm 6

7 Multiple iterations of learning on a training set and testing on a separate validation set are only for evaluation and parameter tuning Final learning should be done on all available data If the validation set is used to choose/tune a learning method, then it cannot also be used to compare performance against another learning algorithm Need yet another test set that is unseen during tuning/learning 7

8 Error costs (false positives vs. false negatives) Training time and space complexity Testing time and space complexity Interpretability Ease of implementation 8

9 Given dataset X For each of K trials Randomly divide X into training set (2/3) and testing set (1/3) Learn classifier on training set Test classifier on testing set (compute error) Compute average error over K trials Problem Training and testing sets overlap between trials Biases the results 9

10 Given dataset X Partition X into K disjoint sets X 1,, X K For i = 1 to K Learn classifier on training set X X i Test classifier on testing set X i (compute error) Compute average error over K trials Testing sets no longer overlap Training sets still overlap 10

11 Stratification Distribution of classes in training and testing sets should be the same as in original dataset Called stratified cross validation Leave-one-out cross validation K = N = X Used when classified data is scarce Medical diagnosis 11

12 Tom Dietterich, 1998 For each of 5 trials (shuffling X each time) Divide X randomly in two halves X 1 and X 2 Compute error using X 1 as training and X 2 as testing Compute error using X 2 as training and X 1 as testing Compute average error of all 10 results 5 trials best number to minimize overlap among training and testing sets 12

13 If not enough data for k-fold cross validation Generate multiple samples of size N from X by sampling with replacement Each sample has approximately 63% of the examples in X Compute average error over all samples 13

14 Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws N e N that is, only 36.8% is new!

15 Confusion matrix Predicted class True class Positive Negative Total Positive tp: true positive fn: false negative p Negative fp: false positive tn: true negative n Total p n N 15

16 Name error accuracy tp-rate fp-rate precision recall sensitivity specificity Formula (fp + fn)/n (tp + tn)/n tp/p fp/n tp/p tp/p = tp_rate tp/p = tp_rate tn/n = 1 fp_rate F-measure: F 2 precisionrecall precision recall 16

17 Error rate = # of errors / # of instances = (FN+FP)/N Precision = # of found positives / # of found = TP / (TP+FP) Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

18

19 Sensitivity is the same as tp-rate and recall Specificity is how well we detect the negatives # of true negatives / total # of negatives 1 false alarm rate Sensitivity vs. specificity curve for different thresholds 19

20 === Run information === Scheme: weka.classifiers.rules.oner -B 6 Relation: labor-neg-data Instances: 57 Attributes: 17 duration wage-increase-first-year wage-increase-second-year wage-increase-third-year cost-of-living-adjustment working-hours pension standby-pay shift-differential education-allowance statutory-holidays vacation longterm-disability-assistance contribution-to-dental-plan bereavement-assistance contribution-to-health-plan class 20

21 Test mode: 10-fold cross-validation === Classifier model (full training set) === wage-increase-first-year: < 2.9 -> bad >= 2.9 -> good? -> good (43/57 instances correct) Time taken to build model: 0 seconds 21

22 === Stratified cross-validation === === Summary === Correctly Classified Instances % Incorrectly Classified Instances % Kappa statistic Mean absolute error Root mean squared error Relative absolute error % Root relative squared error % Coverage of cases (0.95 level) % Mean rel. region size (0.95 level) 50 % Total Number of Instances 57 22

23 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class bad good Weighted Avg === Confusion Matrix === a b <-- classified as 9 11 a = bad 3 34 b = good Predicted class True class Class a Class b Total Class a Class b Total

24 Most comparisons of machine learning algorithms use classification error Problems with this approach May be different costs associated with false positive and false negative errors Training data may not reflect true class distribution 24

25 Receiver Operating Characteristic (ROC) Originated from signal detection theory Common in medical diagnosis Becoming common in ML evaluations ROC curves assess predictive behavior independent of error costs or class distributions Area Under ROC Curve (AUC) Single measure of learning algorithm performance independent of error costs and class distributions 25

26 True positive rate False positive rate Learner L1 Learner L2 Learner L3 Random

27 Learner L1 dominates L2 if L1 s ROC curve is always above L2 s curve If L1 dominates L2, then L1 better than L2 for all possible error costs and class distributions If neither dominates (L2 and L3), then different classifiers are better under different conditions 27

28 Assume classifier outputs P(C x) instead of just C (the predicted class for instance x) Let θ be a threshold such that if P(C x) > θ, then x is classified as C, else not C Compute fp-rate and tp-rate for different values of θ from 0 to 1 Plot each (fp-rate, tp-rate) and interpolate (or convex hull) If multiple points with same fp-rate, then average tp-rates (k-fold cross-validation) 28

29 What if classifier does not provide P(C x), but just C? E.g., decision tree, rule Generally, even these discrete classifiers maintain statistics for classification E.g., decision tree leaf nodes use proportion of examples of each class E.g., rules have the number of examples covered by the rule These statistics can be compared against a varying threshold (θ) 29

30

31 True Positive Rate ROC for J48 vs NN on Labor J48 NN False Positive Rate 31

32

33 We have seen several ways to estimate learning performance Train/test split, cross-validation, ROC, AUC But how good are these at estimating the true performance? E.g., error S (h) ~ error D (h)? 33

34 Estimate the mean μ of a normal distribution N(μ, σ 2 ) Given sample X = {x t } of size N Estimate m = Σ t x t /N, where m ~ N (μ, σ 2 /N) Define statistic Z with a unit normal distribution N(0,1): m / N ~ Z 34

35 95% of Z lies in (-1.96,1.96) 99% of Z lies in (-2.58, 2.58) P(-1.96 < Z < 1.96) = 0.95 Two-sided confidence interval 35

36 / 2 / N z m N z m P N m N m P m N P z α/2 1-α

37 One-sided: z α α Pm Pm 1.64 z N 0.95 N 1 37

38 Previous analysis requires we know σ 2 We can use sample variance S 2 t 2 x m / N 2 S 1 t When x t ~ N(μ, σ 2 ), then (N 1)S 2 /σ 2 is chisquared with N 1 degrees of freedom Since m and S 2 are independent, then m is t-distributed with N 1 degrees of freedom N ( ) / S 38

39 Similar to normal, but with larger spread (longer tails) Corresponds to additional uncertainty with using sample variance 39

40 When σ 2 not known E.g., t 0.025,9 =2.685/2.262, t 0.025,29 =2.364/2.045 (2-tailed) N S t m N S t m P t S m N N m x S N N N t t, /, / ~ /

41 t x t m S 0.022, , df N P P , S t 0.025, Pm t / 2, N S S 1 m t / 2, N 1 1 N N 41

42 Want to claim a hypothesis H 1 E.g., H 1 : error D (h) < 0.10 Define the opposite of H 1 to be the null hypothesis H 0 E.g., H 0 : error D (h) 0.10 Perform experiment collecting data about error D (h) With what probability can we reject H 0? 42

43 Example Sample X = {x t } of size N from N(μ, σ 2 ) Estimate mean m = Σ t x t /N Want to test if μ equals some constant μ 0 Null hypothesis H 0 : μ = μ 0 Alternative hypothesis H 1 : μ μ 0 Reject H 0 if m too far from μ 0 43

44 Example (cont.) We fail to reject H 0 with level of significance α if μ 0 lies in the (1- α) confidence interval: N m 0 We reject H 0 if μ 0 falls outside this interval on either side (two-sided test) z z / 2, / 2 44

45 Example (cont.) One-sided test H 0 : μ μ 0 vs. H 1 : μ > μ 0 Fail to reject H 0 with level of significance α if N Reject H 0 if outside interval m 0, z 45

46 Example (cont.) If variance σ 2 unknown, use sample variance S 2 Statistic now described by student-t distribution N Fail to reject H 0 with level of significance α if N Reject H 0 if outside interval m S m S 0 ~ t N 1 0,, N 1 t 46

47 Example (cont.) H 0 : μ μ 0 vs. H 1 : μ > μ 0 (one-sided) , m S 0.022, S , df N 1 9, t N ( m 0) S Reject H , (,1.833) Note that t ,9 = t x t

48 Learn classifier on training set Test classifier on test set V of size N Assume probability p of error by classifier X = number of errors made by classifier on V X described by binomial distribution P N j N j X j p 1 p j 48

49 Test hypothesis H 0 : p p 0 vs. H 1 : p > p 0 Reject H 0 with significance α if j N j X e p 1 p where e = p 0 N N e P j1 j

50 Single training/validation set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validation trials is e N j j N j P X e p0 1 p0 j1 j 1- α N=100, e=20 Accept if this prob is less than 1- α 50

51 Number of errors X is approx N with mean Np 0 and var Np 0 (1-p 0 ) X Np Np p 0 ~ Z Accept if this prob for X = e is less than z 1-α 1- α 51

52 Approximating X with normal distribution X is sum of N independent random variables from same distribution X/N is approximately normal for large N with mean p 0 and variance p 0 (1- p 0 )/N (central limit theorem) X / N p0 ~ Z p01 p0 / N Fail to reject H 0 (p p 0 ) with significance α if X p p Reject H 0 if outside 0 / N (1 0 p ) / N 0, z (e.g., z 0.05 = 1.64) Works well for Np 5 and N(1-p) 5 52

53 Example Recall earlier example error S (h)=0.3, N = S = 40, X = 12 error D (h) = p (?) H 0 : p p 0, H 1 : p > p 0 X / N Let p 0 = 0.2, α = 0.05 Fail to reject H 0 p 0 (1 p 0 p 0 ) / N (, z 0.2*0.8/ (,1.64), z 0.05 ) 53

54 Example (cont.) What is the 95% (α=0.05) confidence interval around error D (h) = p? Let p 0 = error S (h) = 0.3 P p 0 / 2 P P P z p 0.3(0.7) p p ( 1 p0) p0(1 p0) p p0 z / 2 N N p (0.7)

55 Evaluate learner on K training/testing sets yielding errors p i, 1 i K Reject H 0 with significance α if this value is greater than t α,k-1 Typically K is 10 or 30 (t 0.05,9 =1.83, t 0.05,29 =1.70) t K S p m K ~ 1 ) (, K m p S K p m K i i K i i

56 K-fold cross-validated paired t test Paired test: Both learners get same train/test sets Use K-fold CV to get K training/testing folds p i1, p i 2 : Errors of learners 1 and 2 on fold i p i = p i1 p i2 : Paired difference on fold i Null hypothesis is whether p i has mean , / 1 2, / , in Accept if ~ : vs. 0 : K K K K i i K i i t t t s m K s m K K m p s K p m H H

57 Tester: weka.experiment.pairedcorrectedttester Analysing: Percent_correct Datasets: 8 Resultsets: 2 Confidence: 0.05 (two tailed) Sorted by: - Date: 10/6/10 12:00 AM Dataset (1) rules.on (2) bayes loan (100) v contact-lenses (100) iris (100) labor-neg-data (100) v segment (100) v soybean (100) v weather (100) weather.symbolic (100) (v/ /*) (4/4/0) 57

58 Be careful when comparing more than two learners Each comparison has probability α of yielding an incorrect conclusion Incorrectly reject null hypothesis Incorrectly conclude learner A better than learner B Probability of at least one incorrect conclusion among c comparisons is (1-(1-α) c ) One approach: Analysis of variance (ANOVA) 58

59 Evaluating a learning algorithm Error of learned hypotheses (and other measures) K-fold and 5x2 cross validation ROC curve and AUC Confidence in error estimate Comparing two learners

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu.

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu. Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e CHAPTER 19: Design and Analysis of Machine Learning

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Hypothesis Evaluation

Hypothesis Evaluation Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Stats notes Chapter 5 of Data Mining From Witten and Frank

Stats notes Chapter 5 of Data Mining From Witten and Frank Stats notes Chapter 5 of Data Mining From Witten and Frank 5 Credibility: Evaluating what s been learned Issues: training, testing, tuning Predicting performance: confidence limits Holdout, cross-validation,

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned Data Mining Chapter 5. Credibility: Evaluating What s Been Learned 1 Evaluating how different methods work Evaluation Large training set: no problem Quality data is scarce. Oil slicks: a skilled & labor-intensive

More information

Performance Evaluation

Performance Evaluation Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010 Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate

More information

Empirical Evaluation (Ch 5)

Empirical Evaluation (Ch 5) Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4] 1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

More information

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables. One-Way Analysis of Variance With regression, we related two quantitative, typically continuous variables. Often we wish to relate a quantitative response variable with a qualitative (or simply discrete)

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency

More information

Visual interpretation with normal approximation

Visual interpretation with normal approximation Visual interpretation with normal approximation H 0 is true: H 1 is true: p =0.06 25 33 Reject H 0 α =0.05 (Type I error rate) Fail to reject H 0 β =0.6468 (Type II error rate) 30 Accept H 1 Visual interpretation

More information

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent

More information

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Hypothesis Testing. ECE 3530 Spring Antonio Paiva Hypothesis Testing ECE 3530 Spring 2010 Antonio Paiva What is hypothesis testing? A statistical hypothesis is an assertion or conjecture concerning one or more populations. To prove that a hypothesis is

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Machine Learning: Evaluation

Machine Learning: Evaluation Machine Learning: Evaluation Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Comparison of Algorithms Comparison of Algorithms Is algorithm A better

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Null Hypothesis Significance Testing p-values, significance level, power, t-tests

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2014 January 1, 2017 1 /22 Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

CHAPTER 9: HYPOTHESIS TESTING

CHAPTER 9: HYPOTHESIS TESTING CHAPTER 9: HYPOTHESIS TESTING THE SECOND LAST EXAMPLE CLEARLY ILLUSTRATES THAT THERE IS ONE IMPORTANT ISSUE WE NEED TO EXPLORE: IS THERE (IN OUR TWO SAMPLES) SUFFICIENT STATISTICAL EVIDENCE TO CONCLUDE

More information

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017 Null Hypothesis Significance Testing p-values, significance level, power, t-tests 18.05 Spring 2017 Understand this figure f(x H 0 ) x reject H 0 don t reject H 0 reject H 0 x = test statistic f (x H 0

More information

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION CHAPTER EVALUATING HYPOTHESES Empirically evaluating the accuracy of hypotheses is fundamental to machine learning. This chapter presents an introduction to statistical methods for estimating hypothesis

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains: CHAPTER 8 Test of Hypotheses Based on a Single Sample Hypothesis testing is the method that decide which of two contradictory claims about the parameter is correct. Here the parameters of interest are

More information

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last two weeks: Sample, population and sampling

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

EC2001 Econometrics 1 Dr. Jose Olmo Room D309 EC2001 Econometrics 1 Dr. Jose Olmo Room D309 J.Olmo@City.ac.uk 1 Revision of Statistical Inference 1.1 Sample, observations, population A sample is a number of observations drawn from a population. Population:

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

More information

CC283 Intelligent Problem Solving 28/10/2013

CC283 Intelligent Problem Solving 28/10/2013 Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Sampling Distributions: Central Limit Theorem

Sampling Distributions: Central Limit Theorem Review for Exam 2 Sampling Distributions: Central Limit Theorem Conceptually, we can break up the theorem into three parts: 1. The mean (µ M ) of a population of sample means (M) is equal to the mean (µ)

More information

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α

hypotheses. P-value Test for a 2 Sample z-test (Large Independent Samples) n > 30 P-value Test for a 2 Sample t-test (Small Samples) n < 30 Identify α Chapter 8 Notes Section 8-1 Independent and Dependent Samples Independent samples have no relation to each other. An example would be comparing the costs of vacationing in Florida to the cost of vacationing

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD (b.malin@vanderbilt.edu) Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action Tatyana Goldberg (goldberg@rostlab.org) August 16, 2016 @ Machine Learning in Biology Beijing Genomics Institute in Shenzhen, China June 2014 GenBank 1 173,353,076 DNA sequences

More information

10.4 Hypothesis Testing: Two Independent Samples Proportion

10.4 Hypothesis Testing: Two Independent Samples Proportion 10.4 Hypothesis Testing: Two Independent Samples Proportion Example 3: Smoking cigarettes has been known to cause cancer and other ailments. One politician believes that a higher tax should be imposed

More information

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals Past weeks: Measures of central tendency (mean, mode, median) Measures of dispersion (standard deviation, variance, range, etc). Working with the normal curve Last week: Sample, population and sampling

More information

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition

More information

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption Summary: the confidence interval for the mean (σ known) with gaussian assumption on X Let X be a Gaussian r.v. with mean µ and variance σ. If X 1, X,..., X n is a random sample drawn from X then the confidence

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Models, Data, Learning Problems

Models, Data, Learning Problems Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Models, Data, Learning Problems Tobias Scheffer Overview Types of learning problems: Supervised Learning (Classification, Regression,

More information

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015 AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables? Linear Regression Machine Learning CSE546 Sham Kakade University of Washington Oct 4, 2016 1 What about continuous variables? Billionaire says: If I am measuring a continuous variable, what can you do

More information

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing Agenda Introduction to Estimation Point estimation Interval estimation Introduction to Hypothesis Testing Concepts en terminology

More information