Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Similar documents
Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

How to Describe Accuracy

Statistics: Error (Chpt. 5)

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Statistical Analysis of Engineering Data The Bare Bones Edition. Precision, Bias, Accuracy, Measures of Precision, Propagation of Error

Statistical Analysis of Chemical Data Chapter 4

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

Topic 2 Measurement and Calculations in Chemistry

Inferences for Regression

Uncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error

ANALYTICAL CHEMISTRY - CLUTCH 1E CH STATISTICS, QUALITY ASSURANCE AND CALIBRATION METHODS

Chapter 7 Comparison of two independent samples

Chapter 3 Experimental Error

Lecture 3. - all digits that are certain plus one which contains some uncertainty are said to be significant figures

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Lectures 5 & 6: Hypothesis Testing

Practical Statistics for the Analytical Scientist Table of Contents

The SuperBall Lab. Objective. Instructions

LECTURE 5. Introduction to Econometrics. Hypothesis testing

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Sociology 6Z03 Review II

Chapter 9 Inferences from Two Samples

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Solutions to Final STAT 421, Fall 2008

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

E509A: Principle of Biostatistics. GY Zou

Analytical Chemistry. Course Philosophy

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

COUNTING ERRORS AND STATISTICS RCT STUDY GUIDE Identify the five general types of radiation measurement errors.

CBA4 is live in practice mode this week exam mode from Saturday!

Tests about a population mean

Econometrics. 4) Statistical inference

This gives us an upper and lower bound that capture our population mean.

SPH3U1 Lesson 03 Introduction. 6.1 Expressing Error in Measurement

4.1 Hypothesis Testing

Mathematical Notation Math Introduction to Applied Statistics

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

CHAPTER 9: HYPOTHESIS TESTING

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Passing-Bablok Regression for Method Comparison

Appendix II Calculation of Uncertainties

Business Statistics. Lecture 10: Course Review

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

appstats27.notebook April 06, 2017

Introduction to hypothesis testing

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

CH.9 Tests of Hypotheses for a Single Sample

Chapter 27 Summary Inferences for Regression

Counting Statistics and Error Propagation!

Sampling Distributions: Central Limit Theorem

Analysis of 2x2 Cross-Over Designs using T-Tests

Proposed Procedures for Determining the Method Detection Limit and Minimum Level

Mathematical statistics

Hypotheses and Errors

STA 101 Final Review

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Hypothesis Testing hypothesis testing approach

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 24. Comparing Means. Copyright 2010 Pearson Education, Inc.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

MECHANICAL ENGINEERING SYSTEMS LABORATORY

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

The Purpose of Hypothesis Testing

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

Wilcoxon Test and Calculating Sample Sizes

1] What is the 95% confidence interval for 6 measurements whose average is 4.22 and with a standard deviation of 0.07?

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Introduction to Statistics

1 Statistical inference for a population mean

Tackling Statistical Uncertainty in Method Validation

Measurement uncertainty and legal limits in analytical measurements

SMAM 314 Exam 49 Name. 1.Mark the following statements true or false (10 points-2 each)

Statistical Inference

Econometrics Review questions for exam

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

STA Module 10 Comparing Two Proportions

Review of Statistics 101

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Two-Sample Inferential Statistics

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Advanced Experimental Design

Final Exam STAT On a Pareto chart, the frequency should be represented on the A) X-axis B) regression C) Y-axis D) none of the above

Tables Table A Table B Table C Table D Table E 675

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Machine Learning Linear Classification. Prof. Matteo Matteucci

Uncertainty and Graphical Analysis

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Lecture 30. DATA 8 Summer Regression Inference

Methods and Tools of Physics

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Transcription:

Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low (improper shielding and grounding of an instrument or error in the preparation of standards). 3. Random error - unpredictably high or low (pressure changes or temperature changes). Precision = ability to control random error. Accuracy = ability to control systematic error.

Statistics Experimental measurements always have some random error, so no conclusion can be drawn with complete certainty. Statistics give us a tool to accept conclusions that have a high probability of being correct. Deals with random error! There is always some uncertainty in a measurement. Challenge is to minimize this uncertaintity!!

Standard Deviation of the Mean Another very common way chemists represent error a measurement is to report a value called the standard deviation. The standard deviation of a small sampling is: s N i ( x N 1 x) s = standard deviation of the mean N = number of measurements or points x i = each individual measurement i 2 Trials Measurements 1 21.56 2 27.25 3 25.53 4 24.99 5 24.43 Mean 24.75 s 2.07 x = sample mean RSD = (s/mean) x 100 Coefficient of variance = (s) 2 24.75 ± 2.07

Standard Error of the Mean Another very common way to represent error is to report a value called the standard error. The standard error is related to standard deviation: S. E. s N s = standard deviation of the mean N = number of measurements or points 24.75 ± 0.93 Trials Measurements 1 21.56 2 27.25 3 25.53 4 24.99 5 24.43 Mean 24.75 s 2.07 S.E. 0.93

Confidence Limit Another common statistical tool for reporting the uncertainty (precision) of a measurement is the confidence limit (CL). C. L. t s N Table 22.1: Confidence Limit t-values as a function of (N-1) N-1 90% 95% 99% 99.5% 2 2.920 4.303 9.925 14.089 3 2.353 3.182 5.841 7.453 4 2.132 2.776 4.604 5.598 5 2.015 2.571 4.032 4.773 6 1.943 2.447 3.707 4.317 7 1.895 2.365 3.500 4.029 8 1.860 2.306 3.355 3.832 9 1.833 2.262 3.205 3.690 10 1.812 2.228 3.169 3.581 2.07. L. (2.776) 2.56 5 Trials Measurements 1 21.56 2 27.25 3 25.53 4 24.99 5 24.43 Mean 24.75 s 2.07 S.E. 0.93 C.L. 2.56 C 24.75 ± 2.56

Population versus Sample Size The term population is used when an infinite sampling occurred or all possible subjects were analyzed. Obviously, we cannot repeat a measurement an infinite number of times so quite often the idea of a population is theoretical. Sample size is selected to reflect population. μ 1σ 68.3% μ 2σ 95.5% μ 3σ 99.7% Median = middle number is a series of measurements Range = difference between the highest and lowest values

Q-Test Q-test is a statistical tool used to identify an outlier within a data set. Q exp x q = suspected outlier x n+1 = next nearest data point w = range (largest smallest data point in the set) Critical Rejection Values for Identifying an Outlier: Q-test Q crit N 90% CL 95% CL 99% CL 3 0.941 0.970 0.994 4 0.765 0.829 0.926 5 0.642 0.710 0.821 6 0.560 0.625 0.740 7 0.507 0.568 0.680 8 0.468 0.526 0.634 9 0.437 0.493 0.598 10 0.412 0.466 0.568 x q w x Cup 1 78 2 82 3 81 4 77 5 72 6 79 7 82 8 81 9 78 10 83 ppm Caffeine Mean 79.3 s 3.3 C.L. 95% 2.0

Calculation Q-test Example - Perform a Q-test on the data set from Table on previous page and determine if you can statistically designate data point #5 as an outlier within a 95% CL. If so, recalculate the mean, standard deviation and the 95% CL. Strategy Organize the data from highest to lowest data point and use Equation to calculate Q exp. Solution Ordering the data from Table 22.3 from highest to lowest results in Q exp 72 77 11 0.455 Using the Q crit table, we see that Q crit =0.466. Since Q exp < Q crit, you must keep the data point.

Grubbs Test The recommended way of identifying outliers is to use the Grubb s Test. A Grubb s test is similar to a Q-test however G exp is based upon the mean and standard deviation of the distribution instead of the next-nearest neighbor and range. Table: Critical Rejection Values for Identifying an Outlier: G-test G exp x q s x G crit N 90% C.L. 95% C.L. 99% C.L. 3 1.1.53 1.154 1.155 4 1.463 1.481 1.496 5 1.671 1.715 1.764 6 1.822 1.887 1.973 7 1.938 2.020 2.139 8 2.032 2.127 2.274 9 2.110 2.215 2.387 10 2.176 2.290 2.482 If G exp is greater than the critical G-value (G crit ) found in the Table then you are statistically justified in removing your suspected outlier.

How would you determine if the value is high or not? Is the value high or within the confidence interval of the normal counts? Mean = 5.2 x 10 6 counts s = 2.3 x 10 5 counts 5.2 ± 0.2 (x 10 6 counts) std. dev. 0.2 C. L. (2.776) 5 0.25 5.2 ± 0.3 (x 10 6 counts) 95% C.L. {4.9 to 5.5 is normal range} 5.6 (x 10 6 counts) is not within the normal range at this confidence interval.

Student s t Good to report mean values at the 95% confidence interval Confidence interval From a limited number of measurements, it is possible to find population mean and standard deviation.

Student s t

This would be your first step, for example, when comparing data from sample measurements versus controls. One wants to know if there is any difference in the means. Comparison of Standard Deviations Is s from the substitute instrument significantly greater than s from the original instrument? If F calculated > F table, then the F test (Variance test) F = s 1 2 difference is significant. s 2 2 Make s 1 >s 2 so that F calculated >1

F calculated = (0.47) 2 /(0.28) 2 = 2.8 2 F calculated (2.8 2 ) < F table (3.63) Therefore, we reject the hypothesis that s 1 is signficantly larger than s 2. In other words, at the 95% confidence level, there is no difference between the two standard deviations.

Hypothesis Testing Desire to be as accurate and precise as possible. Systematic errors reduce accuracy of a measurement. Random error reduces precision. The practice of science involves formulating and testing hypotheses, statements that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position. For example, the null hypothesis might be that there is no relationship between two measured phenomena or that a potential treatment has no effect. In statistical inference of observed data of a scientific experiment, the null hypothesis refers to a general or default position: that there is no relationship (no difference) between two measured phenomena, or that a potential medical treatment has no effect. Rejecting or disproving the null hypothesis and thus concluding that there are grounds for believing that there is a relationship between two phenomena (there is a difference in values) or that a potential treatment has a measurable effect is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false.

This would be the second step in the comparison of values after a decision is made regarding the F test. Comparison of Means This t test is used when standard deviations are not significantly different.!!! s pooled is a pooled standard deviation making use of both sets of data. If t calculated > t table (95%), the difference between the two means is statistically significant!

Comparison of Means This t test is used when standard deviations are significantly different!!! If t calculated > t table (95%), the difference between the two means is statistically significant!

Grubbs Test for Outlier (Data Point) If G calculated > G table, then the questionable value should be discarded! G calculated = 2.13 G table (12 observations) = 2.285 Value of 7.8 should be retained in the data set.

Linear Regression Analysis The method of least squares finds the best straight line through experimental data.

Linear Regression Analysis Variability in m and b can be calculated. The first decimal place of the standard deviation in the value is the last significant digit of the slope or intercept.

Use Regression Equation to Calculate Unknown Concentration y (background corrected signal) = m x (concentration) + b x = (y - b)/m Report x ± uncertainty in x s y is the standard deviation of y. k is the number of replicate measurements of the unknown. n is the number of data points in the calibration line. y (bar) is the mean value of y for the points on the calibration line. x i are the individual values of x for the points on the calibration line. x (bar) is the mean value of x for the points on the calibration line.