Accuracy assessment methods and challenges

Similar documents
Notes on Hypothesis Testing, Type I and Type II Errors

Activity 3: Length Measurements with the Four-Sided Meter Stick

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Chapter 22: What is a Test of Significance?

GUIDELINES ON REPRESENTATIVE SAMPLING

a. For each block, draw a free body diagram. Identify the source of each force in each free body diagram.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Output Analysis (2, Chapters 10 &11 Law)

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Properties and Hypothesis Testing

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Analysis of Experimental Data

Issues in Study Design

Statistics 511 Additional Materials

Common Large/Small Sample Tests 1/55

TRACEABILITY SYSTEM OF ROCKWELL HARDNESS C SCALE IN JAPAN

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Correlation Regression

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

1 Inferential Methods for Correlation and Regression Analysis

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Topic 18: Composite Hypotheses

A statistical method to determine sample size to estimate characteristic value of soil parameters

6.3 Testing Series With Positive Terms

ANALYSIS OF EXPERIMENTAL ERRORS

Chapter 13, Part A Analysis of Variance and Experimental Design

Math 140 Introductory Statistics

Simple Random Sampling!

To make comparisons for two populations, consider whether the samples are independent or dependent.

Random Variables, Sampling and Estimation

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

Read through these prior to coming to the test and follow them when you take your test.

Error & Uncertainty. Error. More on errors. Uncertainty. Page # The error is the difference between a TRUE value, x, and a MEASURED value, x i :

AP Statistics Review Ch. 8

1 Constructing and Interpreting a Confidence Interval

Final Examination Solutions 17/6/2010

1 Constructing and Interpreting a Confidence Interval

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

1 Models for Matched Pairs

Statistical inference: example 1. Inferential Statistics

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

MA238 Assignment 4 Solutions (part a)

Accuracy of prediction methods for the improvement of impact sound pressure levels using floor coverings

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

There is no straightforward approach for choosing the warmup period l.

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

M1 for method for S xy. M1 for method for at least one of S xx or S yy. A1 for at least one of S xy, S xx, S yy correct. M1 for structure of r

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

STAC51: Categorical data Analysis

Lecture 10: Performance Evaluation of ML Methods

Stat 200 -Testing Summary Page 1

Measurement uncertainty of the sound absorption

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Determining the sample size necessary to pass the tentative final monograph pre-operative skin preparation study requirements

Lesson 10: Limits and Continuity

6 Sample Size Calculations

Instructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?

Probabilistic Classifiers Using Nearest Neighbor Balls. Climate Change Workshop, Malta, March, 2009

Chapter 23: Inferences About Means

Chapter 2 Descriptive Statistics

Estimation of a population proportion March 23,

EE 505. Lecture 28. ADC Design SAR

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

The improvement of the volume ratio measurement method in static expansion vacuum system

Probability, Expectation Value and Uncertainty

Chapter 8: Estimating with Confidence

Data Analysis and Statistical Methods Statistics 651

Lecture 2: Monte Carlo Simulation

Confidence Intervals for the Population Proportion p

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Assessment and Modeling of Forests. FR 4218 Spring Assignment 1 Solutions

Topic 9: Sampling Distributions of Estimators

Probability and Statistics Estimation Chapter 7 Section 3 Estimating p in the Binomial Distribution

University of California, Los Angeles Department of Statistics. Hypothesis testing

SNAP Centre Workshop. Basic Algebraic Manipulation

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Sample Size Determination (Two or More Samples)

Introductory statistics

Successful HE applicants. Information sheet A Number of applicants. Gender Applicants Accepts Applicants Accepts. Age. Domicile

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

STAT-UB.0103 NOTES for Wednesday 2012.APR.25. Here s a rehash on the p-value notion:

Transcription:

Accuracy assessmet methods ad challeges Giles M. Foody School of Geography Uiversity of Nottigham giles.foody@ottigham.ac.uk

Backgroud Need for accuracy assessmet established. Cosiderable progress ow see use of probability samplig, provisio of cofidece itervals/se etc. BUT, challeges remai as major errors, biases ad ucertaities remai.

Challeges iclude Class defiitio (what is a forest?) Defiitio of chage modificatio v coversio etc. Impacts of spatial mis-registratio Iter-seor radiometric calibratio Variatios i sesor properties (spatial resolutio etc) Impacts of time of image acquisitio Required precisio of estimatio Rarity ad samplig issues etc. etc. Here focus o issues coected with the groud referece data quality ad size

ART : Error matrix Iterpretatio Commoly evaluate accuracy with basic biary cofusio matrix

opular measures e.g. Sesitivity = roducer s accuracy = T T + FN revalece = T + FN T + FN + F + TN Others (e.g. user s accuracy) may be derived.

A simple questio? How accurate is this classificatio (or estimates of chage)? Is the producer s accuracy = 60/60 6%? Is amout of chage = 60/000 6%?

NO, because the matrix might look like but is actually:

Occurs because groud data set is imperfect: Good ews - ca correct for groud data error. Note - here assumed coditioal idepedece (treds more complex ad ca be i differet directio if ivalid ad will be ivalid i may studies).

Impact o estimatio Real accuracy (%) erceived Groud data Remote sesig RS accuracy revalece 90 80 6 6 95 90 76 3 Systematically uderestimate accuracy of remote sesig chage detectio ad overestimate amout of chage.

Impact of imperfect groud data Systematic bias. e.g. - Uderestimate producer s accuracy. - Typically overestimate prevalece (e.g. amout of chage). Magitude of bias ca be very large for eve if groud data set is highly accurate. Ca correct/compesate for groud data error.

ART : Comparisos Ofte compare (e.g. accuracy over time, chage rates betwee regios). Based o compariso of proportios. Must desig a accuracy assessmet programme to meet its objectives. Oe key cocer is the of the testig set. Too large ay o-zero differece will appear statistically sigificat. Too small programme may fail to detect a importat differece.

Sample size determiatio Ofte based o precisio to estimate proportio p ± h = p ± z α/ (SE) SE = p ( p) z / ( ) = α h

BUT Aim is ofte ot to estimate accuracy to a give precisio but to use i a comparative aalysis - compariso agaist a target - compariso agaist aother accuracy (e.g. classifier compariso).eed to cosider additioal properties.

Compariso Very commo v. target e.g. or classifier evaluatio e.g. z = ˆ κ ˆ κ ˆ σ κ + σ κ ˆ BUT ofte iappropriate & pessimistically biased

Comparative aalysis Comparative aalyses ofte based o hypothesis testig. e.g. H o o differece i accuracy H the accuracy values differ Two types of error: Type I whe H is icorrectly accepted (declare a differece as beig sigificat whe it is ot). Type II whe H o is icorrectly accepted (fail to detect a meaigful differece that does exist).

Type I error H is icorrectly accepted (declare a differece as beig sigificat whe it is ot). robability of makig a Type I error is expressed as the sigificace level, α Commoly set α = 0.05 (i.e. a 5% chace of iferrig a sigificat differece exists whe actually is o differece)

Type II error H o is icorrectly accepted (fail to detect a meaigful differece that does exist). robability of makig a Type II error is β ad related to the power of the test (- β). Type I errors typically viewed x4 more importat tha Type II, so commoly, β = 0. or (- β) = 0.8

If (- β) = 0.8 80% chace of declarig a differece that exists as beig sigificat. Is 0.8 adequate? May studies ofte fail to detect a sigificat differece did the study have sufficiet power? Tests usig small sample sizes ofte uderpowered. Difficult to iterpret o-sigificat results (is there really o differece or just failed to idetify it?)

Estimatig sample size To determie sample size eed to cosider: Sigificace level α ower (- β) Effect size miimum meaigful differece.

e.g. commo remote sesig sceario v target ad with cotiuity correctio: Use acquired data to test for differece usig: 0 0 0 ) ( ) ( + = z z β α 0 4 + + = Q p z o o o / =

e.g. commo sceario v aother accuracy ad with cotiuity correctio: Use acquired data to test for differece usig: ( ) / ) Q ( Q z Q z + + = β α 4 4 + + = p p + + = ) ( ) ( p p z

Note:. Equatios may be re-writte e.g. z β = zα / Q + Q Q. Ca also use alteratives for related samples (e.g. McNemar test). 3. Istead of hypothesis testig could use cofidece itervals.

So what? Remember, importat to use appropriate size Too large ay o-zero differece will appear statistically sigificat. Too small fail to detect a importat differece. Sizes used i remote sesig. - rage from 0s 00s 000s 0,000+

Size eeded: v target

v aother accuracy

Coclusios Error i groud truth ca lead to systematic bias uderestimates accuracy ad is correctable. Accuracy assessmet ofte has a comparative compoet has implicatios for sample size (eed to specify effect size, α, ad β). Required size may be quite large. Need to be aware of dager of usig iappropriate size (too small or too large). The ed