Slides for Data Mining by I. H. Witten and E. Frank

Similar documents
Stats notes Chapter 5 of Data Mining From Witten and Frank

How do we compare the relative performance among competing models?

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Lecture on Null Hypothesis Testing & Temporal Correlation

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Introduction to Supervised Learning. Performance Evaluation

6.4 Type I and Type II Errors

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Confidence intervals

Epidemiology Wonders of Biostatistics Chapter 11 (continued) - probability in a single population. John Koval

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Evaluating Hypotheses

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Statistical Inference for Means

Ch. 7. One sample hypothesis tests for µ and σ

Week 1 Quantitative Analysis of Financial Markets Distributions A

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Chapter 4: An Introduction to Probability and Statistics

Visual interpretation with normal approximation

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Chapter 23. Inference About Means

Confidence Interval Estimation

Topic 15: Simple Hypotheses

ECO220Y Review and Introduction to Hypothesis Testing Readings: Chapter 12

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

2008 Winton. Statistical Testing of RNGs

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Ling 289 Contingency Table Statistics

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Swarthmore Honors Exam 2012: Statistics

CENTRAL LIMIT THEOREM (CLT)

The Chi-Square Distributions

QUANTITATIVE TECHNIQUES

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras. Lecture 11 t- Tests

The Chi-Square Distributions

Chapter 7: Hypothesis Testing

The t-statistic. Student s t Test

Student s t-distribution. The t-distribution, t-tests, & Measures of Effect Size

Session 8: Statistical Analysis of Measurements

Maximum-Likelihood Estimation: Basic Ideas

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Last few slides from last time

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

(1) Introduction to Bayesian statistics

Statistics Part IV Confidence Limits and Hypothesis Testing. Joe Nahas University of Notre Dame

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Hypothesis Testing in Action

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Hypothesis Testing in Action: t-tests

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Probability and Statistics. Joyeeta Dutta-Moscato June 29, 2015

Lecture 8 Hypothesis Testing

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

A proportion is the fraction of individuals having a particular attribute. Can range from 0 to 1!

STAT 285 Fall Assignment 1 Solutions

Machine Learning: Evaluation

Probability and Statistics

Distribution-Free Procedures (Devore Chapter Fifteen)

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

The Student s t Distribution

Part 3: Parametric Models

Performance Evaluation and Comparison

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

Power of a hypothesis test

Introductory Econometrics. Review of statistics (Part II: Inference)

Math 180B Problem Set 3

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Performance Evaluation and Hypothesis Testing

Background to Statistics

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Probability and Statistics. Terms and concepts

Chapter 23: Inferences About Means

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Chapter 26: Comparing Counts (Chi Square)

Part III: Unstructured Data

Objective - To understand experimental probability

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Statistical Analysis How do we know if it works? Group workbook: Cartoon from XKCD.com. Subscribe!

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

CS 543 Page 1 John E. Boon, Jr.

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Linear Models: Comparing Variables. Stony Brook University CSE545, Fall 2017

Multiple t Tests. Introduction to Analysis of Variance. Experiments with More than 2 Conditions

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Frequentist Statistics and Hypothesis Testing Spring

Outline for Today. Review of In-class Exercise Bivariate Hypothesis Test 2: Difference of Means Bivariate Hypothesis Testing 3: Correla

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Testing Research and Statistical Hypotheses

Part 3: Parametric Models

Transcription:

Slides for Data Mining by I. H. Witten and E. Frank

Predicting performance Assume the estimated error rate is 5%. How close is this to the true error rate? Depends on the amount of test data Prediction is just like tossing a (biased) coin Head is a success, tail is an error In statistics, a succession of independent events like this is called a Bernoulli process Statistical theory provides us with confidence intervals for the true underlying proportion 9

Confidence intervals We can say: p lies within a certain specified interval with a certain specified confidence Example: S=750 successes in N=1000 trials Estimated success rate: 75% How close is this to true success rate p? Answer: with 80% confidence p [73.,76.7] Another example: S=75 and N=100 Estimated success rate: 75% With 80% confidence p [69.1,80.1] 10

Mean and variance Mean and variance for a Bernoulli trial: p, p (1 p) Expected success rate f=s/n Mean and variance for f : p, p (1 p)/n For large enough N, f follows a Normal distribution c% confidence interval [ X ] for random variable with 0 mean is given by: Pr[" X ] = With a symmetric distribution: Pr[ # $ X $ ] = 1# " Pr[ X c ] 11

Confidence limits Confidence limits for the normal distribution with 0 mean and a variance of 1: Pr[X ] 0.1% 3.09 0.5%.58 Thus: 1 0 1 1.65 1% 5% 10% 0% 40% Pr[ " 1.65 X 1.65] = 90%.33 1.65 1.8 0.84 0.5 To use this we have to reduce our random variable f to have 0 mean and unit variance 1

13 Transforming f Transformed value for f : (i.e. subtract the mean and divide by the standard deviation) Resulting equation: Solving for p : N p p p f ) / 1 ( c N p p p f = " # $ % & ' ( ( ' ( ) / (1 Pr " # $ % & + " # $ $ % & + ' ± + = N N N f N f N f p 1 4

Examples f = 75%, N = 1000, c = 80% (so that = 1.8): f = 75%, N = 100, c = 80% (so that = 1.8): Note that normal distribution assumption is only valid for large N (i.e. N > 100) f = 75%, N = 10, c = 80% (so that = 1.8): (should be taken with a grain of salt) p [0.73,0.767] p [0.691,0.801] p [0.549,0.881] 14

Comparing data mining schemes Frequent question: which of two learning schemes performs better? Note: this is domain dependent Obvious way: compare 10-fold CV estimates Problem: variance in estimate Variance can be reduced using repeated CV However, we still don t know whether the results are reliable 5

Significance tests Significance tests tell us how confident we can be that there really is a difference Null hypothesis: there is no real difference Alternative hypothesis: there is a difference A significance test measures how much evidence there is in favor of rejecting the null hypothesis Let s say we are using 10-fold CV Question: do the two means of the 10 CV estimates differ significantly? 6

Paired t-test Student s t-test tells whether the means of two samples are significantly different Take individual samples using crossvalidation Use a paired t-test because the individual samples are paired The same CV is applied twice William Gosset Born: 1876 in Canterbury; Died: 1937 in Beaconsfield, England Obtained a post as a chemist in the Guinness brewery in Dublin in 1899. Invented the t-test to handle small samples for quality control in brewing. Wrote under the name "Student". 7

Distribution of the means x 1 x x k and y 1 y y k are the k samples for a k- fold CV m x and m y are the means With enough samples, the mean of a set of independent samples is normally distributed Estimated variances of the means are σ x /k and σ y /k If µ x and µ y are the true means then are approximately normally distributed with mean 0, variance 1 m x " µ x x / k m y " µ y y / k 8

Student s distribution With small samples (k < 100) the mean follows Student s distribution with k 1 degrees of freedom Confidence limits: 9 degrees of freedom normal distribution Pr[X ] Pr[X ] 0.1% 4.30 0.1% 3.09 0.5% 3.5 0.5%.58 1%.8 1%.33 5% 1.83 5% 1.65 10% 1.38 10% 1.8 0% 0.88 0% 0.84 9

Distribution of the differences Let m d = m x m y The difference of the means (m d ) also has a Student s distribution with k 1 degrees of freedom Let σ d be the variance of the difference The standardied version of m d is called the t-statistic: md t = / k d We use t to perform the t-test 30

Performing the test Fix a significance level α If a difference is significant at the α% level, there is a (100-α)% chance that there really is a difference Divide the significance level by two because the test is two-tailed I.e. the true difference can be +ve or ve Look up the value for that corresponds to α/ If t or t then the difference is significant I.e. the null hypothesis can be rejected 31

Unpaired observations If the CV estimates are from different randomiations, they are no longer paired (or maybe we used k -fold CV for one scheme, and j -fold CV for the other one) Then we have to use an un paired t-test with min(k, j) 1 degrees of freedom The t-statistic becomes: t = m d d / k t = m x " x + k m y y j 3