(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

Similar documents
Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Median and IQR The median is the value which divides the ordered data values in half.

CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Data Description. Measure of Central Tendency. Data Description. Chapter x i

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!


Chapter 2 Descriptive Statistics

Summarizing Data. Major Properties of Numerical Data

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Parameter, Statistic and Random Samples

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Chapter 4 - Summarizing Numerical Data

Questions about the Assignment. Describing Data: Distributions and Relationships. Measures of Spread Standard Deviation. One Quantitative Variable

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

MEASURES OF DISPERSION (VARIABILITY)

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Formulas and Tables for Gerstman

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

1 Lesson 6: Measure of Variation

For nominal data, we use mode to describe the central location instead of using sample mean/median.

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

Elementary Statistics

2: Describing Data with Numerical Measures

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

(6) Fundamental Sampling Distribution and Data Discription

Lecture 7: Properties of Random Samples

Nonlinear regression

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Sample Size Determination (Two or More Samples)

STP 226 EXAMPLE EXAM #1

Comparing your lab results with the others by one-way ANOVA

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 3 Suggested Solution

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

STAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Measures of Spread: Variance and Standard Deviation

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Chapter 18: Sampling Distribution Models

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

An Introduction to Randomized Algorithms

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Census. Mean. µ = x 1 + x x n n

Lecture 9: Independent Groups & Repeated Measures t-test

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Stat 139 Homework 7 Solutions, Fall 2015

Measures of Spread: Standard Deviation

Introducing Sample Proportions

MATH/STAT 352: Lecture 15

Final Review for MATH 3510

Describing the Relation between Two Variables

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

Final Examination Solutions 17/6/2010

MCT242: Electronic Instrumentation Lecture 2: Instrumentation Definitions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Sampling Distributions, Z-Tests, Power

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

ANALYSIS OF EXPERIMENTAL ERRORS

1 Inferential Methods for Correlation and Regression Analysis

Read through these prior to coming to the test and follow them when you take your test.

BUSINESS STATISTICS (PART-9) AVERAGE OR MEASURES OF CENTRAL TENDENCY: THE GEOMETRIC AND HARMONIC MEANS

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

RADICAL EXPRESSION. If a and x are real numbers and n is a positive integer, then x is an. n th root theorems: Example 1 Simplify

Lecture 1 Probability and Statistics

Linear Regression Models

x c the remainder is Pc ().

Measures of Variation

S Y Y = ΣY 2 n. Using the above expressions, the correlation coefficient is. r = SXX S Y Y

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Frequentist Inference

Statistical Fundamentals and Control Charts

PRACTICE PROBLEMS FOR THE FINAL

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Properties and Hypothesis Testing

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Chapter 6 Sampling Distributions

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 1: Exploring Data

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Introducing Sample Proportions

Analysis of Experimental Data

Probability and statistics: basic terms

Lecture 18: Sampling distributions


Common Large/Small Sample Tests 1/55

(5x 7) is. 63(5x 7) 42(5x 7) 50(5x 7) BUSINESS MATHEMATICS (Three hours and a quarter)

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

A proposed discrete distribution for the statistical modeling of

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

To make comparisons for two populations, consider whether the samples are independent or dependent.

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Transcription:

Chapter 12 Describig Distributios with Numbers Chapter 12 1 Quick math overview = sum These expressios are algebraically equivalet #(x " x ) 2 = # x 2 " (# x) 2 Examples x :{ 2,3,5,6,6,8 } " x = 2 + 3+ 5 + 6 + 6 + 8 = 30 " x 2 = 2 2 + 3 2 + 5 2 + 6 2 + 6 2 + 8 2 =174 (" x) 2 = 30 2 = 900 " x x = = 30 6 = 5 "(x # x ) = (2 # 5) + (3# 5) +...+ (8 # 5) = 0 "(x # x ) 2 = #3 2 + #2 2 + 0 2 +1 2 +1 2 + 3 2 = 24 1

Turig Data Ito Iformatio Ceter of the data mea media mode Spread of the data (variability) variace stadard deviatio rage iterquartile rage Ceters of Data Average - a sigle data value that represets all of the data mea (arithmetic average) media mode Mea ( X ) Traditioal measure of ceter Sum the values ad divide by the umber of values ( ) = 1 x = 1 x 1 + x 2 +L+ x " x x = " i=1 xi 2

Media (M) A resistat measure of the data s ceter Media - the ceter of value of ordered (raked) data If is odd, the media is the middle ordered value If is eve, the media is the average of the two middle ordered values Media = 1 / 2 (+1) th positio i ordered set Media Example 1 data: 2 4 6 Media (M) = 4 Example 2 data: 2 4 6 8 Media = 5 (avg. of 4 ad 6) Example 3 data: 6 2 4 Media 2 (order the values: 2 4 6, so Media = 4) Example # miutes waitig for the PRT (=8): x: {5, 11, 9, 15, 33, 3, 7, 12} x = 5 +11+ 9 +15 + 33 + 3 + 7 +12 =11.875 8 Media: RANK DATA FIRST! {3, 5, 7, 9, 11, 12, 15, 33} Media is 1 / 2 (+1) th positio (8+1) / 2 = 4 1 / 2 4 1 / 2 th positio is half-way betwee 9 ad 11. (9+11) / 2 =10 Media=10 3

Comparig the Mea & Media The mea ad media of data from a symmetric distributio should be close together. The actual (true) mea ad media of a symmetric distributio are exactly the same. I a skewed distributio, the mea is farther out i the log tail tha is the media [the mea is pulled i the directio of the possible outlier(s)]. Mea vs. Media Which should we use? Symmetric or approx symmetric use mea Sigificatly skewed used media affected by outliers (extreme values) x Outliers? If it is a mistake ad is documeted, we ca elimiate it If it is ot a mistake, do ot elimiate it A statistic is robust if it is ot led too far astray by a few outliers. Meas (ad stadard deviatios) are ot robust. 4

Mode Observed value that occurs with the greatest frequecy Note if o mode, write oe ot 0 If two modes: bimodal Measures of Dispersio spread - A geeral term referrig to how spread out or variable a set of umbers is. Very large spread {0, 100, 9999, 100000} No spread {12, 12, 12, 12, 12} Spread or Variability If all values are the same, the they all equal the mea. There is o spread. Variability exists whe some values are differet from (above or below) the mea. We will discuss the followig measures of spread: rage, iterquartile rage, variace, stadard deviatio. 5

Rage Oe way to measure spread is to give the smallest (miimum) ad largest (maximum) values i the data set: Rage = max mi ( the values rage from mi to max ) The rage is strogly affected by outliers, ad is rarely used Quartiles Three umbers that divide the ordered data ito four equal-sized groups. Q 1 has 25% of the data below it. Q 2 has 50% of the data below it. (Media) Q 3 has 75% of the data below it. Obtaiig the Quartiles Order the data. For Q 2, just fid the media. For Q 1, look at the lower half of the data values, those to the left of the media; fid the media of this lower half. For Q 3, look at the upper half of the data values, those to the right of the media; fid the media of this upper half. 6

Iterquartile Rage (IQR) Used to measure dispersio (spread) with the media Sample IQR = Q3-Q1 # miutes waitig for the PRT (=8): {3, 5, 7, 9, 11, 12, 15, 33} Recall: Media is half-way betwee 9 ad 11 M=10 Q1 positio is half-way betwee 5 ad 7 Q1= 6 Q3 is half-way betwee 12 ad 15 Q3= 13 1 / 2 IQR= Q3-Q1 = 13.5-6 = 7.5 The five-umber summary & boxplots Q1 Mi M Q3 Max 5# summary: Mi Q1 M Q3 Max 7

Boxplot (from Five-Number Summary) Cetral box spas Q 1 ad Q 3. A lie i the box marks the media M. Lies exted from the box out to the miimum ad maximum. PRT example 5 # summary ad boxplot 10 6 13.5 3 33 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Variace ad Stadard Deviatio Whe variability exists, each data value has a associated deviatio from the mea: xi " x What is a typical deviatio from the mea? (stadard deviatio) Small values of this typical deviatio idicate small spread i the data Large values of this typical deviatio idicate large spread i the data 8

Variace Fid the mea Fid the deviatio of each value from the mea Square the deviatios Sum the squared deviatios Divide the sum by -1 (gives typical squared deviatio from mea) Variace Formula 1 s 2 = ( "1) # i=1 (x i " x ) 2 #(x " x ) 2 s 2 = "1 Stadard Deviatio Formula typical deviatio from the mea s = 1 ( "1) # i=1 (x i " x ) 2 s = #(x " x ) 2 "1 [ stadard deviatio = square root of the variace ] 9

Choosig a Summary Outliers affect the values of the mea ad stadard deviatio. The five-umber summary should be used to describe ceter ad spread for skewed distributios, or whe outliers are preset. Use the mea ad stadard deviatio for reasoably symmetric distributios that are free of outliers. Dist of calories i popular cady bars Today s cocepts Numerical Summaries Ceter (mea, media) Spread (variace, std. dev., rage, IQR) Five-umber summary & Boxplots Choosig mea versus media Choosig stadard deviatio versus five-umber summary 10