STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Similar documents
Chapter 2: Tools for Exploring Univariate Data

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Elementary Statistics

are the objects described by a set of data. They may be people, animals or things.

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 1. Looking at Data

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Describing distributions with numbers

Chapter 4. Displaying and Summarizing. Quantitative Data

Example 2. Given the data below, complete the chart:

STAT 200 Chapter 1 Looking at Data - Distributions

Describing distributions with numbers

CHAPTER 1. Introduction

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 5: Exploring Data: Distributions Lesson Plan

CHAPTER 2: Describing Distributions with Numbers

Lecture 2 and Lecture 3

Chapter 3. Data Description

Units. Exploratory Data Analysis. Variables. Student Data

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 1: Exploring Data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Stat 101 Exam 1 Important Formulas and Concepts 1

Statistics I Chapter 2: Univariate data analysis

Statistics I Chapter 2: Univariate data analysis

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Sociology 6Z03 Review I

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Unit 2. Describing Data: Numerical

MATH 1150 Chapter 2 Notation and Terminology

Determining the Spread of a Distribution

Determining the Spread of a Distribution

Descriptive Statistics

Introduction to Statistics

Chapter 5. Understanding and Comparing. Distributions

Chapter 3. Measuring data

Histograms allow a visual interpretation

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Chapter 4: Displaying and Summarizing Quantitative Data

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

The empirical ( ) rule

Resistant Measure - A statistic that is not affected very much by extreme observations.

Lecture 1: Descriptive Statistics

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

AP Final Review II Exploring Data (20% 30%)

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Describing Distributions With Numbers

CHAPTER 1 Exploring Data

Describing Distributions

Describing Distributions with Numbers

A is one of the categories into which qualitative data can be classified.

Chapter 3 Data Description

1.3.1 Measuring Center: The Mean

1. Exploratory Data Analysis

1.3: Describing Quantitative Data with Numbers

Descriptive Data Summarization

Continuous random variables

2011 Pearson Education, Inc

Chapter2 Description of samples and populations. 2.1 Introduction.

CIVL 7012/8012. Collection and Analysis of Information

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

STT 315 This lecture is based on Chapter 2 of the textbook.

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

P8130: Biostatistical Methods I

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Glossary for the Triola Statistics Series

Descriptive Univariate Statistics and Bivariate Correlation

STA 218: Statistics for Management

3.1 Measure of Center

Section 3. Measures of Variation

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

MAT Mathematics in Today's World

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

CHAPTER 1 Exploring Data

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

1 Measures of the Center of a Distribution

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Descriptive Statistics-I. Dr Mahmoud Alhussami

After completing this chapter, you should be able to:

Lecture 1: Description of Data. Readings: Sections 1.2,

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

TOPIC: Descriptive Statistics Single Variable

Quantitative Tools for Research

BNG 495 Capstone Design. Descriptive Statistics

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Chapters 1 & 2 Exam Review

Transcription:

INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make up a case Variable any characteristic of an individual (may take different values for different individuals). Categorical variable places an individual into one of several groups/categories. Quantitative variable takes numerical values for which arithmetic operations (adding/averaging) makes sense. Distribution tells us what values a variable takes and how often these values are taken.. Displaying Distributions with Graphs Exploratory data analysis use statistical tools (graphs and numerical summaries) and ideas to help examine data and describe their main features - examine each variable and the relationships among variables - construct graphs and add numerical summaries Graphs for categorical variables Bar graph Pie chart - order of bars are not important - must have all parts that make up the whole

Measuring speed of light Newcomb experiment Measurement dependent on instrument use to make measurement - appropriateness of measurement for purpose Variation difference in measurements may be due to many factors Distribution - the pattern of variation of a variable The distribution of a quantitative variable records its numerical values and how often each value occurs Stemplot gives quick picture of a distribution while including the actual numerical values in the graph. Separate each observation into a stem (has all but the last digit, can be,, or more digits) consisting of all but the final (rightmost) digit and a leaf (has only one digit), the final digit.. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Back-to-back stemplot uses one stem and two sets of leaves, one on either side of the stem helps to make comparison between two data sets. The number of stems can be doubled by splitting the stem in two; one with leaves from 0 to 4 and the other with leaves 5 to 9. Good idea to round off numbers to only a few digits before trying to make a stemplot (lose some accuracy in measurements) Examining a distribution. In any graph of data, look for the overall pattern and for striking deviations from that pattern.

. Can describe the overall pattern of a distribution by its shape, center, and spread. 3. Outlier, important deviation that falls outside the overall pattern. Mode(s) observation(s) that occurs most often - shown by the major peak(s) in the graph Unimodal distribution with one major peak Symmetric distribution values smaller and larger than its midpoint are mirror images of each other Skewed to the right right tail (larger values) longer than left tail (smaller values) Skewed to the left left tail (smaller values) longer than right tail (larger values) Histogram breaks the range of values of a variable into intervals (of equal width) and displays only the count (frequency) or percent (relative frequency) of the observations that fall into each interval Frequency table table showing the intervals with their respective frequencies/relative frequencies Roundoff error may sometimes be significant Looking at data - Histogram can help to shape, spread (outliers), center Time plots plotting the measurements in the order that they are observed (over time). Time series measurements of a variable taken at regular intervals over time - examples: economic/social data Seasonal variation a pattern in a time series that repeats itself at known regular intervals of time Trend persistent long-term rise or fall Monthly consumer price index for some product 3

Index number nationwide average price (less variable than the price at any one store that may from time to time offer special prices) Seasonally adjusted helps to avoid misinterpretation especially for short periods of time. Decomposing time series Statistical software programs can help to examine a time series by decomposing the data into systematic patterns such as trends and seasonal variation and the residuals that remains after we remove these patterns. Describing Distributions with numbers Measures of center x + x +... + xn. Mean = x = x n n = i. Median = M The median is the midpoint of the distribution, the number such that half the observations are smaller and the other half are larger. To find the median:. Arrange the observations in increasing order.. If the number of observations n is odd, the median is the center observation at the position (n+)/ in the ordered list. 3, If the number of observations n is even, the median is the mean of the two center observations in the ordered list and holds the same position as above in #. The mean is affected by extreme observations whereas the median is not affected, hence the median is called a resistant measure and the mean is not resistant. Measuring spread: Quartiles Quartiles divide the distribution into 4 equal parts 4

To calculate the quartiles:. Arrange the observations in increasing order and find the median (same as Q - the second quartile) 50% of the observations are to its left. The first quartile (Q ) is the median of the observations on the left of the median. 5% of the observations are to its left 3. The third quartile (Q 3 ) is the median of the observations on the right of the median. 75% of the observations are to its left Percentiles divide the distribution into 00 equal parts 5%ile = Q 50%ile = Q = M 75%ile = Q 3 Range is the highest score minus the lowest score. Interquartile range is the highest quartile minus the lowest quartile. IQR = Q 3 Q An observation is a suspected outlier if it falls more than.5 X IQR above Q 3 or below Q. The Five number summary include Minimum Q M = Q Q 3 Maximum in the given order. Boxplot graph of the five number summary with suspected outliers plotted individually - useful in comparing distributions. Central box spans the quartiles. A line in the box marks the median 3. Observations more than.5 X IQR above Q 3 or below Q are plotted as individual outliers 4. Lines extend from the box out to the smallest and largest observations that are not suspected outliers. 5

The variance s of a set of observations is the average of the squares of the deviations of the observations from their mean. s ( x = + ( x +... + ( x n n = n ( x i Hence, the standard deviation is s= ( xi n x to x n are the observations and n- is the degrees of freedom Properties. s measures spread about the mean and should be used only when the mean is chosen as the measure of center.. s = 0 only when there is no spread, all observations are the same value. Otherwise s > 0 measures the spread of the observations about the mean (more spread implies a bigger s) 3. s, like the mean is not resistant. A few outliers can make s very large. A Linear Transformation changes the original variable x into a new variable x new = a + bx (equation of a straight line) the constant a shift all the values of x a units upward/downward the positive constant b changes the size of the unit of measurement linear transformations do not change the shape of a distribution Effect of a linear transformation To see the effects of a linear transformation on measures of center and spread, apply these rules: 6

. Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (interquartile range and standard deviation) by b.. Adding the same number a (+ve or ve) to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread..3 The normal distributions Strategy for exploring data. Always plot data (stemplot or histogram). Look for overall pattern and striking deviations (outliers) 3. Calculate numerical summary to describe center and spread and 4. Draw a smooth curve approximately through the tops of the bars in the histogram. A density curve is a curve that. is always on or above the horizontal axis. has area exactly underneath it It describes the overall pattern of a distribution. The area under the curve and above any range of values is the relative frequency of all observations that fall in that range. Measuring center and spread for density curves If symmetric, mean, median and mode are same x value that has the highest peak Median and mean of a density curve. The median has an area of 0.5 on each side. The mean is the balance point 3. If skewed to the right, the measures are in the order mode, median and mean (the mean is pulled to the right) If skewed to the left, the measures are in the order mean, median and mode (the mean is pulled to the left) The mean of a population (idealized distribution) is µ 7

The standard deviation of a population (idealized distribution) is σ The normal curve has equation: f ( x) = e σ π x µ σ The 68-95-99.7 rule In the normal distribution with mean µ and standard deviation σ. 68% of the observations fall within σ of the mean µ. 95% of the observations fall within σ of the mean µ 3. 99.7% of the observations fall within 3σ of the mean µ Standardizing observations If x is an observation from a distribution that has mean µ and standard deviation σ, the standardized value of x is µ z = x called a z-score σ Standard normal distribution - N(0, ): mean 0 and standard deviation If the variable X has any normal distribution N(µ, σ) with mean and standard deviation, then the standardized variable µ Z = X has a standard normal distribution σ The standard normal table gives the area under the curve to the left of the z-score value. This is often interpreted as a probability. It is important that all X variables are standardized in order to use the standard normal tables to compute probabilities. Normal quantile plot 8

- very sensitive way to assess normality, however, not easily done by hand - computer software programs allow us to construct a more accurate plot without taking much time If the points on a normality quantile plot lie close to a straight line, the plot indicates that the data are normal. Systematic deviations from a straight line indicate a nonnormal distribution. Outliers appear as points that are far away from the overall pattern of the plot. To construct the normal quantile plot. Arrange the observed data values from smallest to largest. Record what percentile of the data each value occupies. Eg. for 0 observations, the first is at the 5% point, the next is at the 0% point, and so on.. Find the z-scores for each of the percentiles. Eg. z = -.645 is the 5% point of the standard normal distribution. 3. Plot each data point x against the corresponding z. If the data distribution is close to standard normal, the plotted points will lie close to the 45 0 line x = z. If the data distribution is closed to any normal distribution, the plotted points will lie close to any straight line. Granularity when plotted points appear to form a horizontal segment in the probability. This does not hold us back from adopting a normal distribution for the data. - This could be avoided if the measurements are taken more accurately. 9