Census. Mean. µ = x 1 + x x n n

Similar documents
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Median and IQR The median is the value which divides the ordered data values in half.

Elementary Statistics

Chapter 2 Descriptive Statistics

The Poisson Distribution

Topic 10: Introduction to Estimation

Parameter, Statistic and Random Samples

MATH/STAT 352: Lecture 15

Measures of Spread: Variance and Standard Deviation

2: Describing Data with Numerical Measures

MEASURES OF DISPERSION (VARIABILITY)

independence of the random sample measurements, we have U = Z i ~ χ 2 (n) with σ / n 1. Now let W = σ 2. We then have σ 2 (x i µ + µ x ) 2 i =1 ( )

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Axioms of Measure Theory

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Measures of Spread: Standard Deviation

Least-Squares Regression

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

Estimation for Complete Data

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

CURRICULUM INSPIRATIONS: INNOVATIVE CURRICULUM ONLINE EXPERIENCES: TANTON TIDBITS:

Confidence Intervals for the Population Proportion p

Topic 9: Sampling Distributions of Estimators

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Confidence Intervals QMET103

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter 8: Estimating with Confidence

Topic 9: Sampling Distributions of Estimators

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Frequentist Inference

Stat 225 Lecture Notes Week 7, Chapter 8 and 11

The Boolean Ring of Intervals


AP Statistics Review Ch. 8

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Analysis of Experimental Data

STP 226 EXAMPLE EXAM #1

Random Variables, Sampling and Estimation

Topic 9: Sampling Distributions of Estimators

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Module 1 Fundamentals in statistics

Discrete probability distributions

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

4.1 Sigma Notation and Riemann Sums

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

For nominal data, we use mode to describe the central location instead of using sample mean/median.

Stat 139 Homework 7 Solutions, Fall 2015

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Chapter 2 The Monte Carlo Method

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MA131 - Analysis 1. Workbook 2 Sequences I

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

6.3 Testing Series With Positive Terms

Statisticians use the word population to refer the total number of (potential) observations under consideration

Sequences I. Chapter Introduction

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Measures of Variation Cumulative Fequency Box and Whisker Plots Standard Deviation

Chapter 5: Exploring Data: Distributions Lesson Plan

Introducing Sample Proportions

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 1 2: Sequences and Series. A sequence is an ordered list of numbers, e.g. 1, 2, 4, 8, 16, or

Chapter 23: Inferences About Means

1 Lesson 6: Measure of Variation

Final Examination Solutions 17/6/2010

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

An Introduction to Randomized Algorithms

(7 One- and Two-Sample Estimation Problem )

Activity 3: Length Measurements with the Four-Sided Meter Stick

1 Inferential Methods for Correlation and Regression Analysis

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Simulation. Two Rule For Inverting A Distribution Function

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

CONFIDENCE INTERVALS STUDY GUIDE

Estimation of a population proportion March 23,

(b) What is the probability that a particle reaches the upper boundary n before the lower boundary m?

Understanding Samples

University of California, Los Angeles Department of Statistics. Hypothesis testing

For example suppose we divide the interval [0,2] into 5 equal subintervals of length

Algebra II Notes Unit Seven: Powers, Roots, and Radicals

Test of Statistics - Prof. M. Romanazzi

Section 5.1 The Basics of Counting

Stat 421-SP2012 Interval Estimation Section

Introducing Sample Proportions

Final Review for MATH 3510

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Transcription:

MATH 183 Basic Statistics Dr. Neal, WKU Let! be a populatio uder cosideratio ad let X be a specific measuremet that we are aalyzig. For example,! = All U.S. households ad X = Number of childre (uder age 18) livig i the household. To study this sceario, we obtai a set of measuremets { x 1, x 2,...,x } which may be either a cesus or simply a radom sample. Cesus I a cesus, we assume that we have a measuremet from every member i the populatio uder cosideratio. For small populatios, such as studets i oe particular class or players o oe sports team, it is ot hard to obtai a cesus by surveyig each perso i that populatio. But for extremely large populatios, such as all U.S. households, it is early impossible to obtai a real cesus eve whe madated to do so every te years by the Uited States Costitutio. But whe we do have a cesus of measuremets X from a populatio, the we ca fid the true values of the mea µ, the variace! 2, the stadard deviatio!, as well as other populatio parameters. Mea Give a set of measuremets { x 1, x 2,...,x }, the mea (or average) of these specific values is give by µ = x 1 + x 2 +...+ x Whe the values are a cesus of a specific measuremet X from a populatio!, the µ is true average value. It is also called the expected value of X ad may be deoted by µ X or E[X]. Variace The variace, deoted by! 2, is the average squared distace from the mea ad is give by. " 2 = (x 1 # µ) 2 + (x 2 # µ) 2 +... + (x # µ) 2 = 1 $ i =1 (x i # µ) 2 Alterately, the variace is the average of the squares mius the square of average, ad ca be computed by! 2 = x 1 2 + x 2 2 +... +x 2 µ 2. The variace is sometimes deoted by! X 2 or Var( X).

Stadard Deviatio We take the square root of the variace to get the stadard deviatio deoted by! :! =! 2 = 1 #(x i " µ) 2. i =1 The stadard deviatio gives a way of measurig the average spread from the mea. A small! meas that measuremets are cosistetly close to the average µ. Media ad Mode Whe the measuremets { x 1, x 2,...,x } are i icreasig order, the the media is the middle value, or the average of the two middle values if there are a eve umber of measuremets. The mode is the measuremet (or measuremets) that occurs most ofte. Example 1. Below are the umber of credit hours erolled for this semester for all studets i oe sectio of MATH 116. Fid the mea, variace, stadard deviatio, media, ad mode of these values. What percetage of these measuremets are withi oe stadard deviatio of class average? Credit Hours Take This Semester 18 15 14 13 18 14 14 18 17 15 16 18 18 18 15 15 15 17 18 13 15 18 14 18 15 18 15 19 18 14 16 17 16 14 15 15 Solutio. Let! = this specific MATH 116 class ad let X = Number of credit hours erolled i this semester. Because we have a cesus of this class, we ca fid the true mea µ ad the true stadard deviatio!. To do so, we shall eter the data ito the calculator, sort it ito icreasig order, ad use the 1 Vars Stats commad. Eter data ito L1 Sort data the eter 1 Var Stats L1 Output Scroll dow The mea is µ = 18 + 15 + 18 +... + 15 36 = 576 36 = 16 credit hrs Note: The calculator displays this value as x, which stads for sample mea. But because we have a cesus of this class ad ot merely a sample, we use µ to represet that we have the real average of µ = 16 credit hours.

The variace is computed by! 2 = x 1 2 + x 2 2 +...+ x 2 " µ 2 = 9324 36 " 162 = 3 The takig the square root gives us the stadard deviatio of! = 3 1.732. The true stadard deviatio is displayed as σx o the calculator output, ad this value is to be used if we have a cesus of measuremets. So ow we ca say that the class average is 16 credit hours with a average spread from 16 of! 1.732 credit hours. The media is the middle measuremet. But because we have a eve umber of measuremets (36), we must take the average of the middle two measuremets. After sortig the 36 values, the middle values are i the 18th ad 19th positios. The 18th value is 15 while the 19th value is 16. So the media is (15 + 16)/2 = 15.5, which is also displayed o the TI. After sortig the values, it is easy to make a frequecy chart from which we see that the mode is 18 hours. That is, i this class more studets are registered for 18 hours tha for ay other umber of hours. # Hours # Studets 13 2 14 6 15 10 16 3 17 3 18 11 19 1 N = 36 To fid the pct. withi oe stadard deviatio of average, we first compute µ ±! = 16 ± 1. 732, which is about 14.268 to 17.732. So studets takig 15, 16, or 17 hours fall i this rage. There are 10 + 3 + 3 = 16 studets i this rage. Thus, 16/36, or 44.44% of the studets i this class are withi oe stadard deviatio of class average. Questio: Is this class represetative of all studets o campus? Represetative of just udergraduates? Represetative of all studets takig a Ge. Ed. math class this semester? Or perhaps represetative of just MATH 116 studets this semester? Probably the most we ca say is that this class is represetative of all MATH 116 studets this semester. If you wat a sample that is represetative of a larger portio of the studet body, the you must sample accordigly from amog that etire group of studets. But you should ever take a existig sample ad try to say that it is represetative of a larger group that was ot represeted i the sample.

Sample Mea ad Sample Deviatio Ofte a collectio of measuremets is just a sample from a larger populatio. I this case, we caot fid the real average µ. Istead we ca oly compute the sample mea deoted by x. However, x is computed the same way as we computed µ by addig up the values ad dividig by ; we just deote it ow by x to specify that we are oly workig with a sample. The sample deviatio, deoted by S, is computed similarly to! ; however, we use x i the formula, rather tha µ, ad we average the squared differeces by dividig by!1 rather tha. " = 1 $ (x i # µ) 2 S = i =1 For a cesus 1 "1 # (x i " x ) 2 i =1 For a sample By dividig by!1, the sample variace S 2 becomes a ubiased estimator of the true ukow variace! 2. That is, the average of all possible S 2 from all possible samples of size will equal the true variace! 2. Quartiles ad 1.5 IQR The first quartile Q 1 is the media of just the measuremets that are below the overall media. The third quartile Q 3 is the media of just the measuremets that are above the overall media. These values are displayed, alog with the miimum, media, ad maximum, i the 1 Vars Stats output. Together, the values mi Q 1 med Q 3 max make up the five-umber summary. The 1.5 IQR (or 1.5 Iterquartile Rage) is the iterval Q 1!1.5 " (Q 3! Q 1 ) to Q 3 +1.5! (Q 3 " Q 1 ). Values from a sample that are outside this rage are called outliers ad are ofte excluded from samples so as ot to throw off the average too much. Example 2. Below are data o city mpg from a sample of two-seater cars: Model City MPG Model City Acura NSX 17 Hoda Isight 57 Audi TT Quattro 20 Hoda S2000 20 Audi TT Roadster 22 Lamborghii Murcielago 9 BMW M Coupe 17 Mazda Miata 22 BMW Z3 Coupe 19 Mercedes-Bez SL500 16 BMW Z3 Roadster 20 Mercedes-Bez SL600 13 BMW Z8 13 Mercedes-Bez SLK230 23 Chevrolet Corvette 18 Mercedes-Bez SLK320 20 Chrysler Prowler 18 Porsche 911 GT2 15 Ferrari 360 Modea 11 Porsche Boxter 19 Ford Thuderbird 17 Toyota MR2 25

Use your calculator for the followig: (i) Fid the sample mea ad sample deviatio, the media, the mode, ad the fiveumber summary. What percetage of these mileages are withi oe sample deviatio of sample average? (ii) Make a histogram with rage of [5, 60] divided ito bis of legth 5. Which bi has the most measuremets? The secod most? (iii) Give the 1.5 IQR ad deote ay suspected outliers. Solutio. (i) We first eter the data ito a list i the STAT EDIT scree. For this problem we shall use L2. After eterig the data, we sort the data with the commad SortA(L2. The we compute the statistics with the commad 1 Var Stats L2. Eter data ito L2 Sort ad compute stats Output Scroll dow Because the data are oly a sample of measuremets from the populatio! of all two-seater makes of cars, the value of x 19.59 is the sample mea. The sample deviatio is displayed as S 9.22. The miimum value is show to be 9 while the maximum value is 57. The media is give as 18.5. That is, 18.5 is the average of the two middle measuremets whe i icreasig order (the 11th ad 12th with this data set of eve-size 22). The 11th value is 18 while the 12th value is 19. So the media is (18 + 19)/2 = 18.5. The first quartile is Q 1 = 16, which is the media of all values below 18.5. Ad Q 3 = 20, which is the media of all values above 18.5. So the five-umber summary is 9 16 18.5 20 57. By scrollig dow the sorted list, we see that the mode is 20 which occurs most ofte at 4 times. The rage x ± S is 19.59 ± 9.22, which is 10.37 to 28.81, cotais 20 out of 22 measuremets. So we ca say that 90.9% of these mileages are withi oe sample deviatio of sample average. (ii) Adjust the WINDOW ad STAT PLOT settigs to see a histogram (3rd type). Press GRAPH, the TRACE ad scroll to see the bi rage values. The rage [15, 20) has 9 values while the rage [20, 25) has 7 values. Adjust WINDOW Adjust STAT PLOT Histogram TRACE

(iii) The 1.5 IQR is the iterval Q 1!1.5 " (Q 3! Q 1 ) to Q 3 +1.5! (Q 3 " Q 1 ), where Q 3! Q 1 = 20 16 = 4. So the 1.5 IQR is 16! 1.5 " 4 to 20 + 1. 5! 4, or 10 to 26. Thus, the outliers are those values outside of this rage which are 9 mpg ad 57 mpg. Frequecy Charts Ofte measuremets are give i a frequecy chart that states how may times each measuremet occurs. Measuremet x 1 x 2 x 3.. x m Frequecy k 1 k 2 k 3.. k m Now we let = k 1 +... + k m = total umber of measuremets. The the mea µ is actually a weighted average give by µ = k 1 x 1 +... + k m x m Whe usig the calculator, eter the measuremets ito oe list ad eter the frequecies ito aother list.. Example 3. A survey o the umber of childre per household was take throughout a eighborhood. Here are the results from the sample that was obtaied. Number of childre 0 1 2 3 4 5 6 Number of households 60 42 86 59 22 4 2 (i) Fid the mea ad deviatio, the media, the mode, ad the five-umber summary for the umber of childre i this sample of households. What percetage of these households are withi a deviatio of average? (ii) Make a histogram with bis of legth 1. outliers. (iii) Give the 1.5 IQR ad deote the Solutio. Here! = All households i this eighborhood ad X = Number of childre i household. We shall use list L3 for the measuremets ad list L4 for the frequecies, the eter the commad 1 Var Stats L3, L4. (i) Because we have a sample, x 1.86 childre with S 1.34; the media is 2 ad the mode is 2. The five-umber summary is 0 1 2 3 6.

Next we compute x ± S = 1.86 ± 1.34, which is 0.52 to 3.2. This rage icludes all households havig 1, 2, or 3 childre. There are (42 + 86 + 59) = 187 out of 275 such households, or 68% withi a sample deviatio of sample average. The 1.5 IQR is from 1! 1. 5 " 2 to 3 +1.5! 2, or 2 to 6; thus, there are o outliers because all measuremets are withi this rage. Exercise 1. Cosider the Verbal ACT scores from a group of Eglish majors at WKU: 16, 18, 20, 21, 21, 22, 22, 23, 24, 25, 26, 27, 30, 34 (a) Make a histogram with rage [15, 36] ad bis of legth 3. Which bi rage has the most scores? (b) Assumig this group is the etire populatio uder cosideratio: (i) Fid the true mea. (ii) Fid ad explai the media ad the mode. (iii) Fid the true stadard deviatio. (iv) Compute the percetage of these studets whose Verbal ACT score is withi a stadard deviatio of average. (c) Assumig this group is oly a sample from a larger populatio! : (i) Fid the sample mea ad sample deviatio. (ii) Give the boudaries of the 1.5 IQR ad state the outliers. (iii) I this case, what is the appropriate larger populatio! that this sample could represet? Exercise 2. A group of WKU freshma were asked to give the umber of hours take durig their first semester. The results were: Hrs 13 14 14.5 15 15.5 16 16.5 17 18 # Fr 4 5 8 14 8 23 12 18 8 (a) Make a histogram with bis of legth 1. Which bi rage has the most values? (b) Assumig this group is the etire populatio uder cosideratio: (i) Fid the true mea. (ii) Fid ad explai the media ad the mode. (iii) Fid the true stadard deviatio. (iv) Compute the percetage of these studets whose Verbal ACT score is withi a stadard deviatio of average. (c) Assumig this group is oly a sample from a larger populatio! : (i) Fid the sample mea ad sample deviatio. (ii) Explai Q1 ad Q3. (iii) Give the boudaries of the 1.5 IQR ad state the outliers. (iv) I this case, what is the appropriate larger populatio! that this sample could represet?

1. Solutios Dr. Neal, WKU Data i L1 Adjust WINDOW Adjust STAT PLOT (a) [21, 24) has 5 scores (b) (i) µ = 23.5 (ii) Because there are 14 scores, the media is the average of the 7th ad 8th scores, which is (22 + 23)/2 = 22.5. The modes are 21 ad 22 (both occur twice, ad o other score occurs more tha oce). (iii)! 4.547 (iv) µ ±! is 18.953 to 28.047 ad cotais 10/14 or 71.43% of the scores. (c) Assumig this group is oly a sample from a larger populatio!, the (i) x = 23.5 ad S 4.719 (ii) The 1.5 IQR is from 21! 1. 5(26! 21) to 26 + 1. 5(26! 21), or 13.5 to 33.5. The oly outlier is 34. (iii)! = All Eglish majors at WKU. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2. [16, 17) has 35 scores (b) (i) µ = 15.88 hours. (ii) Because there are 100 measuremets, the media is the average of the 50th ad 51st measuremet, which is (16 + 16)/2 = 16. The mode is also 16 hrs because it occurs most ofte at 23 times. (iii)! 1.1898. (iv) µ ±! is 14.69 to 17.07, which cotais all studets takig 15, 15.5, 16, 16.5, or 17 hours. Thus, there are 75 out of 100 or 75% of the studets withi oe stadard deviatio of average. (c) (i) x = 15.88 ad S 1.19578 (ii) Q1 = 15 is the media of the values below 16. Q3 = 17 is the media of the valuesabove 16. (iii) The 1.5 IQR is from 15!1.5 " 2 to 17 + 1.5! 2, or 12 to 20 which cotais all measuremets. There are o outliers. (iv)! = All WKU Freshme.