Present Practice, issues and headaches.

Size: px
Start display at page:

Download "Present Practice, issues and headaches."

Transcription

1 1

2 Present Practice, issues and headaches. Classification is Data Mining area par excellence. Will focus on binary targets of events/non-events. Research and applications in: clinical data analysis (disease / no-disease, insistence on odd ratios and logistic regression), Direct marketing: response/no-response, attrition, etc. Recommender systems: Interesting / uninteresting. Fraud, Terrorism,.. Issues and Headaches (can t covered them all in this lecture): -Binary target training/estimation mixture of events/non-events. -Obfuscating terminology. -Model comparisons. -Modeling methodologies confront unexpected issues: colinearity and separation in logistic (and neurals?), smooth and non-step response function in trees, etc. 2

3 Meaning of Probability Statements: context dependent. Probability measured in interval (0, 1) and methods are mechanical, i.e., context provided by practitioner/analyst. E.g: 1. Model estimates household has %70 probability of responding to credit card solicitation. Solicitation cost is minimal and bad feeling from non-responding customer disregarded. Likely action: solicit. 2. Model estimates that probability that conference ceiling will fall on us right now is 40%. How many of you will stay until I finish reading this paragraph? Action: Run for your life? 3. DNA matching asserts that probability that male A is father of baby is 95%, i.e., 1 in 20 is False Positive. Action: A is father? Cost (profit) of implementing/not implementing decision, even if not exactly quantifiable, is most important in context. 3

4 Present Practice, issues and headaches. There is advice on mixture of 0/1 of target variable for estimation, but doubts persist. Obfuscating terminology of ROC, precision, model choice, etc. Concepts used in classification methods used to compare models are derived from different methods (i.e., trees, neurals, etc). Unclear practices on separation, 0/1, colinearity and variable selection. Colinearity likely more of a bete-noire than in linear regression case, but adds doubts about stability of predicted probability in context of scoring future data bases. In short: all models produce predictions in probability form and decision has to be made. In next pages: Events = 1, non-events = 0. 4

5 In short, practical issues: Model Comparison and selection based on some criterion. Pitfall: Null model should be important part of the game, often disregarded because we get paid to find something, not in order to find nothing. Cutoff selection for decision making. In both cases, costs of wrong decision can affect model and cutoff selections. Further, most applications focused on events tiny minority of data base. But many cases where both sides matter: public opinion, success/failure of negotiation, etc. Further refinement (not developed here): soliciting true responder not necessarily profitable: responder may be bad customer Further refinement of targeting: response / noresponse, profitable / not so. Actually 3 levels, nobody cares for Unprofitable non-responder. 5

6 Not Recommended. 6

7 Model Evaluations and Comparison. Model Chi-Square Accuracy, Percent Correct Predictions, ROCs, etc. Pseudo-R 2 Hosmer-Lemeshow Model Chi-Square: LRT: -2 log (likelihood model 2) - -2log (likelihood model 1) Typically comparing against null model (model 2). It cannot indicate whether model is useful, just that is better than chance prediction. 7

8 Accuracy of predictions (big area). Predictions are usually classified as events ( 1 ) whenever posterior probability > cutoff. Else, non-event ( 0 ). Demsar (2006) shows most algorithms compared based on accuracy. Classification Rate: Proportion events predicted as events (similarly for non-events). (Also called accuracy: overall classification rate of events and non-events). Precision Rate: Proportion predicted events / true events (similarly for non-events). 8

9 Classification (confusion) Table. Preds Predicted 0 Predicted 1 Total Reals Real 0 A (TN) B (FP) A + B (Neg) Real 1 C (FN) D (TP) C + D (Pos) Total A + C B + D A + B + C + D Classification (Accuracy) Rate: 100 (A + D) / grand total. Sensitivity = Event class-recall- (hit) rate TPR = 100 * D / C + D. Specificity = Non-Event classification rate TNR = 100 * A / A + B. 1 Specificity = Non-Event miscl. rate (false alarm) FPR = 100 * B / A + B. 9

10 Graphical Appreciation. Predicted Events. Events No Events P r e d ic te d _ E v e n ts E v e n t P r e c is io n = P r e d ic te d _ E v e n t P r e d ic te d _ E v e n t E v e n t R e c a ll = E v e n t F m e a s u r e : 2 ( β + 1) P r e c is io n * R e β 2 c a ll * p r e c is io n + r e c a ll 10

11 More terminology, just because Conditional probabilities. TPR = P ( PRED POS / POSITIVE (EVENT)) = Recall TNR = P (PRED NEG / NEGATIVE (NO EVENT)) FPR = P (PRED POS / NEGATIVE (NO EVENT)) = 1 - TNR FNR = P (PRED NEG / POSITIVE (EVENT)) PPR = Positive Precision rate = Purity = P ( POS / PRED POS)) = TP / (TP + FP) NPR = Negative Precision Rate = P (NEG / PRED (NEG)) = TN/ (TN + FN) Unconditional Probabilities. Prevalence = risk = P (Positive) Used mostly in clinical studies F-measure evenly balanced when β = 1. Favors precision when > 1, recall otherwise. Used in text classification, information retrieval and language processing. 11

12 Classification table as goodness-of-fit? 1. Good model may classify poorly: Hosmer and Lemeshow (2000, p. 157): model with misclassification rate dependent on slope, not on model fit. 2. Classification done by choosing cut-off point in posterior probability. Well known that classification favors majority group, which is independent of model fit. Thus, if P 1 =.49 and P 2 =.52, and cut-off is.5, observations classified into different categories when probabilities very close. Assumes known and unchanging natural class distribution and that error cost of FP = errors FN. Typically favors majority class; but in most applications, cost of misclassifying 1 is higher. 12

13 Accuracy can mislead. Example 1: Assume 1 important. Overall Accuracy (left model) = 92.5%, Overall Accuracy (right model) = 97.5%, but right model misses all 1 s. Predicted 0 Predicted 1 Predicted 0 Predicted 1 Actual Actual Example 2: 80% accuracy in two models. If test data set contains more 0 s, right model better. If more 1 s, left model. Predicted 0 Predicted 1 Predicted 0 Predicted 1 Actual Actual

14 Classification table as goodness-of-fit? 3. Models (prob. Distrs) from different samples cannot be compared based on these tables because predicted probabilities are confounded by the distribution of probabilities in original samples. These tables are useful only when classification is main goal. 4 ROC curve: cut-off probability point is changed, and sensitivity (Y) (TPR) vs. 1 specificity (X) (FPR) is plotted: Area under curve (AUROC) shows percentage of pairs of events / non-events such that predicted probability for event > predicted probability non-event (same as Mann-Whitney U statistic) or Wilcoxon test of ranks. Also, related to Gini = 2 AUROC 1. 14

15 For given FPR (prob. Non-event predicted as event, e.g.,.5), ROC shows corresponding TPR (prob. of predicting event as event, e.g.,.81). Numbers along curve: accuracy, cutoff, precision. 15

16 Roc: what does it mean? For randomly chosen responder/sick patient/attriter and another randomly chosen non-responder/healthy patient/non-attriter, AUROC measures probability of identifying event by way of the model alone. Thus, no model and balanced 0/1 50%. Direct Marketing Application. Suppose mailing data base 10,000 candidates. Expect 10% response rate if mail everybody, expect 1,000 responders. Assume budget constraint that allows to mail to just 3,500 FPR * 9,000 + TPR * 1000 = 3,500, or TPR = 3.5 FPR * 9 From ROC graph, locate pair (FPR, TPR) that satisfies equation, derive cutoff point and contact those above cutoff point (ellaborated later on). 16

17 ROC vs. Accuracy. Accuracy directly related to error rate, ROC to ordering of ranks of probability. In general, ROC better measure of model performance than accuracy but NOT ALWAYS. Ling and Zhang (2003) prove that AUC is statistically consistent and more discriminating than accuracy. If AUC and acc are statistically consistent to degree C if AUC indicates model 1 better than model 2, there is probability C that acc will agree. If AUC is D times more discriminating than acc, it is D times more likely that AUC can differentiate between models 1 and 2. 17

18 Inference on ROCs. Hanley and McNeil (1982) conservative SE of ROC curve: SE = AUROC (1 AUROC )( n 1)( Q AUROC ) + ( n 1)( Q AUROC ) Q n n AUROC 2AUROC = Q = 2 AUROC (1 + AUROC ) "1": event Next slide: example of (over-fitted) model with seemingly grandiose ROC: note that accuracy is above 97% initially and then declines to around 49%. Note precision decline. Don t blame colinearity or any other bete-noire for this. 18

19 Don t blindly choose this model. 19

20 ROC as model selector. Area under ROC (AUROC) used as parameter to judge better model, larger better, although often more practical to estimate AUROC over limited range of FPR. AUROC estimated either non-parametrically via trapezoidal rule: m 1 AUC ( θ ) FPR TPR( θ i ) + TPR FPR 2 i = 2 TPR = TPR( θ ) TPR( θ ) AUROC more easily estimated by: (S n1 (n1 + 1) / 2) / (n0 n1) (Hand & Till, 2001). = S r i,1 r i,1 = rank of downwardly sorted posterior probability of event observation. n1: number of events, n0: number of non-events. REMEMBER: AUROC related to ranks, nonparametric measure, not linked with R-squares. i 1 FPR = FPR( θ ) FPR( θ ) i i i 1 20

21 ROC as model selector. 0.5 AUROC 1. Model comparison via ROC: Outer ROC curves indicate better models. But ROCs can cross. Can then create convex hull of ROC curves, and for specific FPR, disregard models below convex hull of other models. Also, perfect ROC (= 1) does not imply perfect classification, only as long as posterior probability of Event observations > non-event observations. I.e., ROC measures ranking. But observations can still be misclassified dependent on cutoff. Note that ROC graph could be discrete if algorithm predicts class membership instead of predicted probability (e.g., trees, discrete classifier). 21

22 ROC as model selector. Point (10, 40) more conservative than liberal (90, 99) because conservatives make positive classifications only with strong evidence to avoid FP (remember, lower FP associated with higher cutoffs). Conversely, liberals use weaker evidence to catch many positives, but get high FPs. Since more negatives than positives in original data, performance in left hand side of ROC graph more interesting. 45* line: random classifier that guesses TP as much as FP. Points below line perform worse than random negation (reverse) of rule produces point above 45* line. Closer to 45, worse performance. How close is close? Need inference. Note declining accuracy (numbers above curve) as we increase FPR. Notice steeper ROC slope on left side as compared to right side here positives easier to find than negatives in this area. 22

23 ROC as model selector. ROC unaffected by class skewness/imbalance or cost distributions. Note that ROC depends on column of TP and FP, and denominators of TPR and FPR are invariant to class skewness. Accuracy, precision, lift, F-score are affected because all them use values from both columns of classification matrix. Same algorithm applied on 2 test data sets with different class balances show same ROC curves, but precision-recall curve changes. For logistic regression case, separation may imply AUROC = 1 but Wald CIs are too wide and model is unreliable. Sensitivity = Event class-recall- (hit) rate TPR = 100 * D / C + D. Specificity = Non-Event classification rate TNR = 100 * A / A + B. 1 Specificity = Non-Event miscl. rate (false alarm) FPR = 100 * B / A + B. 23

24 Classification table as goodness-of-fit? (cont. 1). 5) If want to visualize cut-off point of highest separation, plot sensitivity and specificity (Y) vs. prob. cut-off points (X). Intersection indicates maximum separation (K-S test). Graph below, optimal: 12%, no cost specified. LA8 24

25 Slide 24 LA8 \SAS_RESDEV\EXTENSIONS\CUTOFF\DAVE3.SAS Leonardo Auslender, 1/26/07

26 Classification table as goodness-of-fit? (cont. 2). Optimal cut-off point minimizes expected cost of misclassification. Often, cost not known. When known, usually items of interest are C(i, j), i j, cost of predicting i when it should have been j. C(0,1) = CFN; C(1, 0) = CFP. Let π be original proportion of event in population. Then optimal threshold minimizes cost: (1 π) FPR C(1, 0) + π (1 TPR) * C(0, 1). Derivation of Minimal cost and cutoff: Cavg = Co + CTP P(TP) + CTN P(TN) + CFP P(FP) + CFN P(FN) P(TP) = π * TPR, TNR = 1 FPR Co: modeling/testing fixed cost, FNR = 1 - TPR and substituting around: Cavg = [TPR π ( CTP - CFN )] + [FPR (1-π) ( CFP - CTN )] + Co + [CTN (1-π) + CFN π] 25

27 Derivation of Minimal cost and cutoff: Typically, CTP = CTN = 0 Cavg = - TPR * π * CFN + FPR * (1-π) * CFP + Co + CFN * π = Co + CFN π (1 TPR) + FPR CFP (1 π) To minimize cost, find first derivative and use ROC: Cavg = [ROC(FPR) * π * ( CTP - CFN )] + [FPR * (1-π) * ( CFP - CTN )] + Co + (CTN * (1-π) + CFN * π) dc droc = π( CTP CFN ) + (1 π)( CFP CTN ) = 0 dfpr dfpr and rearranging, we get : droc (1 π)( CFP CTN ) = dfpr π( CFN CTP) 26

28 Derivation of Minimal cost and cutoff: d R O C (1 π ) * ( C F P C T N ) = = = > d F P R π * ( C F N C T P ) f o r u n i t a r y c o s t s o f m i s c l a s s i f i c a t i o n, d C p o i n t a t w h i c h = 0 i s π d R O C ( c a l l e d K S t e s t m a x d i s t a n c e ). NB: In above discussion, π is implicit but not visible in ROC curve. If CTN = CTP = 0, for given π, higher CFP relative to CFN shifts optimal point to left or ROC and higher cutoff point chosen (and thus avoids false positives more often). Vice-versa for higher relative CFN. 27

29 Choosing cutoff via max KS. 28

30 5 priors, 4 optimal cutoffs and minimal costs (2 overlap) FN=FP.. 29

31 Cost of FN / FP =

32 Cost of FN / FP =

33 Cost of FN / FP =

34 33

35 Positive Precision Rates dependent on priors. 34

36 Precision Recall Curves. Many situations in which priors and costs are unknown. Could also be case in which priors and costs actually vary, either in time or by subpopulations (what, you thought we d make it easy on you?) E.g., prevalence of certain diseases differs across races or ethnic groups. In these cases, ROC and threshold cutoff not so reliable because ROC does not reflect changes in priors (when we want it to, remember, denominators of TPR and FPR do not change by varying priors or costs). Aside: In clinical cases, some standard e.g., is to study ROC to left of FPR =.05. Study called Partial ROC. Cai and Dodd (web), PSA study 3 years and 6 months before onset of prostate cancer. For FPR <= 2%, TPR ~ 30% at T = 3 and ~ 57% T = 6 months PSA does not provide enough information. 35

37 Precision Recall curve. 36

38 Cutoff set by equality between precision curve and TPR. 37

39 38

40 Cutoff set by maximum Cumulative Profit. 39

41 Cutoff set by maximum Cumulative Profit. 40

42 Can we link all this stuff together? Yes (Alvarez, 2002) Let r = recall p = precision a = accuracy π = prior probability π r + (π + a 1) p = 2 π p r π = p(1 a) / (r + p 2 pr) (e.g., unsure of original π). 41

43 42

44 Balanced / Unbalanced target Rare Event. Typical situation: Binary dependent variable with far fewer 1 s than 0 s: fraud, extreme diseases, oil spills. Logistic regression in this case, for instance, underestimates probability of rare events. Also, tendency to create enormous data bases to contain rares. In case of rare events of logistic regression, most applications yield small probabilities; but for Y=1, probabilities would be larger π i (1-π i ) larger for obs Y = 1 variance (its inverse) is smaller additional 1 s bring in more information than 0 s (King, Zeng 2001). L o g i s t i c C o v a r i a n c e M a t r i x : n ' - 1 V ( β ) = [ π i ( 1 - π i ) x i x i ] i = 1 43

45 Balanced/Unbalanced target Rare Event. In case of rare event, Y=1 density very poorly estimated on left tail relative to Y = 0 density on right tail threshold on X to classify Y is too far to the right. 44

46 Balanced/Unbalanced target (cont. 1). 1) Model estimation typically uses samples. 2) Estimation assumption: training class distribution of target matches natural distribution. 3) But classifiers built from unbalanced samples perform poorly on minority class usually. Worse, if costlier to misclassify minority cases. 4) Some algorithms cannot use cost information effectively. 5) Weiss and Provost (2001): replicating natural distribution in sample, not necessarily good practice for estimation. Present practice ( 1 minority class, class of interest). 1) Under-sample 0 : throws out potentially useful data. Danger of sample bias: do not select on X differently for Y = 0 and 1. Example: Y = 1: cancer patients, Y = 0 random sample from U.S. population. But Y = 1 patient sought health care X: find medical specialist, have right tests, etc. get Y = 0 sample from patients who sought treatment and had no cancer. 45

47 Balanced/Unbalanced target (cont. 2). 2) Over-sample 1 : increases training data size and estimation time. Typically makes exact copies of 1 s probable over-fitting. Alternative: make imperfect copies (Auslender 2000, 2001). If unbalanced data and minority misclassification more costly minimize cost instead of error rate by factoring in these costs. Weiss and Provost results (with classification trees, C4.5): 1) Rules predicting 1 : higher error rate than predicting 0 because: 2) Test data 1 s misclassified more often than 0 s, because: a) Test data contains more 0 s. b) Algorithms sometimes strongly affected by initial marginal priors. c) Algorithms cannot learn boundaries of 1 class with relatively few examples. 46

48 Balanced/Unbalanced target (cont. 4). Logistic Regression Case. 1) Correction according to prior proportion of 1 s: βs for predictors are consistent, β 0 needs correction for δ = true proportion and α = sample proportion of 1 s. NOTE: not robust to model misspecification. β ˆ 1 δ α ln )( ) 0 δ 1 α [( ] 2) Weighting in estimation by α (Y = 1) and 1 α(y = 0). Advantages: robust to misspecification. Disadvantages: 1) usual method of computing standard errors is biased; 2) rare event finite sample corrections not developed for weighting (see King, Zeng 2001 for full discussion). 47

49 Balanced/Unbalanced target (cont. 5). Re-balancing trees (Auslender, 1998, never finished): create samples with respective percentages of 0/1 equal to 45/55, 46/54,,,,,, 54/46, 55/54. Typically observe that upper set of levels is similar or same for all samples (split values and variables). Lower layer typically contains similar variables that are split, sometimes in different hierarchical order: variable 1 is split in level 4 in sample 45/55, and in level 5 in sample 50/50, while variable 2 behaves reciprocally. Conclude: top level is core of tree, and middle level still provides strong information. After that, information not reliable. Similar approach possible for logistic regression in context of colinearity 48

50 Pseudo-R Square. Proportion of variation(?) explained by the model. McFadden's - Pseudo - R 2 statistic: McFadden's - R 2 = 1 - [LL ( model 1) / LL( model 2 )] {= 1 - [-2LL(model 1 ) / -2LL(model 2 )] } Where model 2 usually null model (intercept only). R 2 : scalar measure between 0 and (somewhat close to) 1. (Others: Nagelkerke, Efron s, McKelvey & Zavoina). 49

51 Hosmer-Lemeshow fit (2000, p. 148). Assume 5 binary predictors in the model maximum number of patterns = 2 ^ Assume only 8 patterns exist (J). Pearson based measures of fit distributed asymptotically as chisquare (J p - 1 parms). Problem: for increasing n under J = n, p increases at same rate as n degrees of freedom is wrong. Proposal: create patterns by grouping by percentiles of the posterior predicted distribution. By simulation, if g = 10 percentiles (deciles) chosen, distributed as chi-square (g 2) when J = n. 50

52 51

53 Ponderings. Most business applications do not provide cost/profit information and decisions lack vital input. Even possible that event profit >>>> non-event cost such that it is better to target entire population and predicted probabilities merely indicate one ordering. Most applications do no focus on prediction but on classification, specifically TPR = recall = hit rate. But from precision-recall curve, high TPR can be associated with low precision Real world prediction rates could be low, and classification rates high. It is real world predictions that matter, not how well model performed during classification. In absence of real cost/profit information, don t wait for ceiling to fall on you. 52

54 53

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010 Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD (b.malin@vanderbilt.edu) Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Performance Evaluation

Performance Evaluation Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision: Confusion matrix classifier-determined positive label classifier-determined negative label true positive a b label true negative c d label Accuracy = (a+d)/(a+b+c+d) a = true positives b = false negatives

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Multiple regression: Categorical dependent variables

Multiple regression: Categorical dependent variables Multiple : Categorical Johan A. Elkink School of Politics & International Relations University College Dublin 28 November 2016 1 2 3 4 Outline 1 2 3 4 models models have a variable consisting of two categories.

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Generalization to Multi-Class and Continuous Responses. STA Data Mining I

Generalization to Multi-Class and Continuous Responses. STA Data Mining I Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Data Analytics for Social Science

Data Analytics for Social Science Data Analytics for Social Science Johan A. Elkink School of Politics & International Relations University College Dublin 17 October 2017 Outline 1 2 3 4 5 6 Levels of measurement Discreet Continuous Nominal

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure).

7. Assumes that there is little or no multicollinearity (however, SPSS will not assess this in the [binary] Logistic Regression procedure). 1 Neuendorf Logistic Regression The Model: Y Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and dichotomous (binomial; 2-value), categorical/nominal data for a single DV... bear in mind that

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Introduction to Basic Statistics Version 2

Introduction to Basic Statistics Version 2 Introduction to Basic Statistics Version 2 Pat Hammett, Ph.D. University of Michigan 2014 Instructor Comments: This document contains a brief overview of basic statistics and core terminology/concepts

More information

Statistics for classification

Statistics for classification AstroInformatics Statistics for classification Una rappresentazione utile è la matrice di confusione. L elemento sulla riga i e sulla colonna j è il numero assoluto oppure la percentuale di casi della

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

Business Statistics. Lecture 5: Confidence Intervals

Business Statistics. Lecture 5: Confidence Intervals Business Statistics Lecture 5: Confidence Intervals Goals for this Lecture Confidence intervals The t distribution 2 Welcome to Interval Estimation! Moments Mean 815.0340 Std Dev 0.8923 Std Error Mean

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht

Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Learning Classification with Auxiliary Probabilistic Information Quang Nguyen Hamed Valizadegan Milos Hauskrecht Computer Science Department University of Pittsburgh Outline Introduction Learning with

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression

Review. Number of variables. Standard Scores. Anecdotal / Clinical. Bivariate relationships. Ch. 3: Correlation & Linear Regression Ch. 3: Correlation & Relationships between variables Scatterplots Exercise Correlation Race / DNA Review Why numbers? Distribution & Graphs : Histogram Central Tendency Mean (SD) The Central Limit Theorem

More information

Dynamics in Social Networks and Causality

Dynamics in Social Networks and Causality Web Science & Technologies University of Koblenz Landau, Germany Dynamics in Social Networks and Causality JProf. Dr. University Koblenz Landau GESIS Leibniz Institute for the Social Sciences Last Time:

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Reading for Lecture 6 Release v10

Reading for Lecture 6 Release v10 Reading for Lecture 6 Release v10 Christopher Lee October 11, 2011 Contents 1 The Basics ii 1.1 What is a Hypothesis Test?........................................ ii Example..................................................

More information

THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS

THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS THE SKILL PLOT: A GRAPHICAL TECHNIQUE FOR EVALUATING CONTINUOUS DIAGNOSTIC TESTS William M. Briggs General Internal Medicine, Weill Cornell Medical College 525 E. 68th, Box 46, New York, NY 10021 email:

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

On optimal reject rules and ROC curves

On optimal reject rules and ROC curves On optimal reject rules and ROC curves Carla M. Santos-Pereira a and Ana M. Pires b a Universidade Portucalense Infante D. Henrique, Oporto, Portugal and Centre for Mathematics and its Applications (CEMAT),

More information

Logistic Regression Models for Multinomial and Ordinal Outcomes

Logistic Regression Models for Multinomial and Ordinal Outcomes CHAPTER 8 Logistic Regression Models for Multinomial and Ordinal Outcomes 8.1 THE MULTINOMIAL LOGISTIC REGRESSION MODEL 8.1.1 Introduction to the Model and Estimation of Model Parameters In the previous

More information

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to:

STA Module 5 Regression and Correlation. Learning Objectives. Learning Objectives (Cont.) Upon completing this module, you should be able to: STA 2023 Module 5 Regression and Correlation Learning Objectives Upon completing this module, you should be able to: 1. Define and apply the concepts related to linear equations with one independent variable.

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

MULTINOMIAL LOGISTIC REGRESSION

MULTINOMIAL LOGISTIC REGRESSION MULTINOMIAL LOGISTIC REGRESSION Model graphically: Variable Y is a dependent variable, variables X, Z, W are called regressors. Multinomial logistic regression is a generalization of the binary logistic

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Relating Latent Class Analysis Results to Variables not Included in the Analysis

Relating Latent Class Analysis Results to Variables not Included in the Analysis Relating LCA Results 1 Running Head: Relating LCA Results Relating Latent Class Analysis Results to Variables not Included in the Analysis Shaunna L. Clark & Bengt Muthén University of California, Los

More information

Gov 2000: 6. Hypothesis Testing

Gov 2000: 6. Hypothesis Testing Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6.

More information