Descriptive Statistics

Similar documents
T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

The Chi-Square Distributions

The Chi-Square Distributions

Ch. 7. One sample hypothesis tests for µ and σ

LAB 2. HYPOTHESIS TESTING IN THE BIOLOGICAL SCIENCES- Part 2

PLSC PRACTICE TEST ONE

Single Sample Means. SOCY601 Alan Neustadtl

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Classroom Activity 7 Math 113 Name : 10 pts Intro to Applied Stats

Ordinary Least Squares Regression Explained: Vartanian

Harvard University. Rigorous Research in Engineering Education

Section 9.5. Testing the Difference Between Two Variances. Bluman, Chapter 9 1

Hypothesis Testing. Week 04. Presented by : W. Rofianto

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

How to Describe Accuracy

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Chapter 7 Comparison of two independent samples

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Stat 231 Exam 2 Fall 2013

Advanced Experimental Design

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

LOOKING FOR RELATIONSHIPS

Density Temp vs Ratio. temp

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Two-Sample Inferential Statistics

General Certificate of Education Advanced Level Examination June 2013

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

40.2. Interval Estimation for the Variance. Introduction. Prerequisites. Learning Outcomes

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

23. MORE HYPOTHESIS TESTING

Business Statistics. Lecture 10: Course Review

Confidence intervals

POLI 443 Applied Political Research

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

D. A 90% confidence interval for the ratio of two variances is (.023,1.99). Based on the confidence interval you will fail to reject H 0 =!

Tables Table A Table B Table C Table D Table E 675

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Acknowledge error Smaller samples, less spread

Lecture 30. DATA 8 Summer Regression Inference

Background to Statistics

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Formulas and Tables. for Essentials of Statistics, by Mario F. Triola 2002 by Addison-Wesley. ˆp E p ˆp E Proportion.

Solutions to Practice Test 2 Math 4753 Summer 2005

The t-statistic. Student s t Test

What If There Are More Than. Two Factor Levels?

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

Mathematical Notation Math Introduction to Applied Statistics

ST430 Exam 2 Solutions

CENTRAL LIMIT THEOREM (CLT)

Difference in two or more average scores in different groups

Inferential Statistics

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

STP 226 EXAMPLE EXAM #3 INSTRUCTOR:

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Quantitative Analysis and Empirical Methods

Estimating a Population Mean

Inferences about central values (.)

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Final Exam - Solutions

Final Exam - Solutions

Chapter 16. Simple Linear Regression and Correlation

Inference for Regression Simple Linear Regression

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

How do we compare the relative performance among competing models?

10.4 Hypothesis Testing: Two Independent Samples Proportion

General Certificate of Education Advanced Level Examination June 2014

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

EXAM 3 Math 1342 Elementary Statistics 6-7

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Lectures 5 & 6: Hypothesis Testing

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Formulas and Tables for Elementary Statistics, Eighth Edition, by Mario F. Triola 2001 by Addison Wesley Longman Publishing Company, Inc.

P-values and statistical tests 3. t-test

79 Wyner Math Academy I Spring 2016

16.400/453J Human Factors Engineering. Design of Experiments II

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Probability and Statistics Notes

Introduction to Uncertainty. Asma Khalid and Muhammad Sabieh Anwar

Midterm 2 - Solutions

Categorical Data Analysis. The data are often just counts of how many things each category has.

Soc3811 Second Midterm Exam

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Summer Review for Mathematical Studies Rising 12 th graders

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Sampling, Confidence Interval and Hypothesis Testing

MAT 2377C FINAL EXAM PRACTICE

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

Table 1: Fish Biomass data set on 26 streams

Transcription:

Descriptive Statistics Once an experiment is carried out and the results are measured, the researcher has to decide whether the results of the treatments are different. This would be easy if the results were perfectly consistent: Corn heights for Treatment 1 (in): 32, 32, 32, 32, 32, 32, 32, 32 Corn heights for Treatment 2: (in) 36, 36, 36, 36, 36, 36, 36, 36 Obviously treatment 2 results in taller corn. Unfortunately, real life results are not so simple: Corn heights for Treatment 1 (in): 28, 32, 36, 39, 25, 30, 34, 32 Corn heights for Treatment 2 (in): 33, 32, 38, 36, 40, 39, 34, 36 Differences are not obvious, so we need statistics. Statistics describe a sample and use Latin symbols. Parameters describe a population and use Greek symbols. Statistics are used when individual characteristics are variable. Note that you have to measure several individuals to know how variable they are, in other words, you need replication. Some physical properties are very consistent, that is, they have low variability. An example might be the speed at which steel balls fall in a vacuum - the biggest source of variability is likely to be the accuracy of the timing device. How many sheets of paper do you need to measure to know the average length of the sheets of paper in a ream? How many do you need to measure to know whether the lengths of sheets in 2 reams are the same or different? Biological properties, on the other hand, tend to have a fairly high variability due to the many variations in genetics and environment even within a single species. How many class participant heights do you need to measure to know the average height of people in a classroom? How many do you need to measure to know whether the average heights of people on the left and right sides of the room are the same or different? Most populations and samples follow a normal or Gaussian distribution that looks like a bell-shaped curve. 1

Normal Distribution Describing a normal distribution The distribution is described by: Mean: central value Variance: measure of the scatter about the central value Sums of squares: squares of deviations from the mean, numerator of the variance formula Degrees of freedom: number of independently calculated differences from the mean. If 2

the mean is calculated from the sample, the value of the last observation can be calculated from the mean and the values of all the other observations. Thus there are only n - 1 independent observations. df = n - 1 Standard deviation: measure of the scatter of the individuals about the central value. Units are the same as for the original data. A higher variance or standard deviation describes a wider curve. Coefficient of Variation: ratio of the size of the scatter (standard deviation) to the size of the mean, expressed as percent. Allows comparisons of relative variability of different populations, for example, whether the weights of elephants are more variable than the weights of mice. The statistics calculated so far describe samples and populations, but do not test for differences between samples and populations. For such tests the distributions of sample means are needed. 3

Example: Speed (-2 minutes) in seconds of two calculating machines for computing sums of squares Machine A Machine B Replication Time (sec) Deviation from Mean Dev. 2 Time (sec) Deviation from Mean Dev. 2 A - B Rep. or Pair Totals 1 30* 8 64 14 0 0 16 44 2 21-1 1 21* 7 49 0 42 3 22* 0 0 5-9 81 17 27 4 22 0 0 13* -1 1 9 35 5 19-3 9 14* 0 0 5 33 6 29* 7 49 17 3 9 12 46 7 17-5 25 8* -6 36 9 25 8 14* -8 64 16 2 4-2 30 9 23* 1 1 8-6 36 15 31 10 23 1 1 24* 10 100-1 47 Total ÓX=220 Óx 2 =214 ÓX=140 Óx 2 =316 Ó=80 4

Distribution of Sample Means Taking many samples from a population and calculating the mean for each sample gives a new distribution that has the same mean as the population distribution but is narrower. The width of the distribution depends on the sizes of the samples taken. The means of large samples are less likely to be in the tails than are the means of small samples, so the distribution becomes narrower as the samples get larger. For the distribution of sample means Student calculated the distribution of the deviations of the sample means from the population mean relative to the standard error. This t distribution gives the probability of finding a difference between a sample mean and true mean greater than any chosen t. The distribution depends on the sample size or df. Student t Distribution with 18 df 5

Cutoff points are shown on the graph for the probability of finding a larger t is less than 5%. The points occur at t.05,18 = 2.101 The probability of finding a t that is in either one of the two tails, ie a t larger than a cutoff point, is given in the t table. Distribution of t Degrees of Freedom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Probability of Obtaining a Value as Large or Larger 0.400 0.200 0.100 0.050 0.010 0.001 1.376 1.061 0.978 0.941 0.920 0.906 0.895 0.889 0.883 0.978 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 31.598 12.941 8.610 6.859 5.959 5.405 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 t values and probabilities can be looked up on the web at http://members.aol.com/johnp71/pdfs.html The probabilities in the t distribution provide the means of testing for differences between samples. 6

Confidence Limits: for a given probability, the limits of the range within which the true mean lies. Determine the confidence limits for ì for calculating machine A, given: For P = 0.95 For P = 0.99 t (.05,9) = 2.262 t (.01,9) = 3.250 CL = 22 ± (2.262)(1.54) CL = 22 ± (3.250)(1.54) = 22 ± 3.5 = 22 ± 5.005 = 18.5, 25.5 = 17.0, 27.0 16 17 18 19 20 21 22 23 24 25 26 27 28 sec 18.5 P = 0.95 25.5 17.0 P = 0.99 27.0 With a confidence of 95%, we can say the true mean, ì, is in the range 18.5 to 25.5. There is 1 chance in 20, or 5 in 100, that the true mean for machine A lies outside this range. Null Hypothesis: a statement that there is no difference between the parameters involved Under the null hypothesis, the mean of machine A is equal to the mean of machine B. If the null hypothesis is true, the true mean of machine B will lie in the range 18.5 to 25.5. 7

If the true mean for machine B lies within the expected range, we accept the null hypothesis that there is no difference. If the range in which the true mean for machine B lies does not overlap the range for machine A, the true mean of B lies outside the range and there is a significant difference between the two machines, i.e. we have 95% confidence that the means are different. The null hypothesis is rejected. Note that the null hypothesis is essential for defining the probability and range in which the second mean is expected to lie if there is no difference. This makes it possible to test for differences with a known confidence level. If the null hypothesis is rejected, the alternative hypothesis is accepted that the mean of A is not the same as the mean of B. Possible forms are: A B A > B A < B 8

Rounding Data Record to a number place that is 1/4 of the standard deviation per unit. If s= 6.96 kg/experimental unit 6.96/4 = 1.74 Since the first number place in 1.74 is in the one s position, record data to the nearest kg. If s = 2.5 kg/unit 2.5/4 = 0.625 Since the first number place in 0.625 is in the tenth s position, record data to the nearest tenth of a kg or 1 decimal place. Rounding Means Round means to a number place that is 1/4 of the standard error of the mean. If s = 6.96 with 5 reps 3.11/4 = 0.8 Round means to the nearest tenth or to 1 decimal place. t Test The t test compares the difference between two means to the standard error of the means. As shown before, the t test to compare the difference between the sample mean and the population mean is: The t test can be used to directly test the difference between two means. The t test to compare the difference between means from two different treatments is: 9

Degrees of Freedom (df) for t 1. If samples are from two populations, the df is the sum of the df for the two populations. 2. If pairs of values or replicated comparisons are being compared, the df is the number of pairs - 1. For the two calculating machines with 9 + 9 = 18 df Compare to the table value of t.05,18 = 2.101. Since 3.298 > 2.101, the two means are different with 95% confidence. Comparing the calculated t statistic to the table value at 1%, what do you conclude? 10

Abbreviations in Statistics Sample Population Statistic Preferred symbol Acceptable symbol Arithmetic mean Chi-square Correlation coefficient Coefficient of multiple determination Coefficient of simple determination Coefficient of variation Degrees of freedom Least significant difference Multiple correlation coefficient Not significant Probability of type I error Probability of type II error Regression coefficient Sample size Standard error of mean Standard deviation of sample Student s t Variance Variance ratio 2 r R 2 r 2 CV df LSD R NS á â b n SE SD t s 2 F The symbols *, **, and *** are used to show significance at the P = 0.05, 0.01, and 0.001 levels, respectively. Significance at other levels is designated by a supplemental note. From: Publications Handbook and Style Manual. 1998. Amer. Soc. Agron. Inc., Crop Sci. Soc. Amer. Inc., Soil Sci. Soc. Amer., Madison, WI. DF s ì â N ó ó 2 11