Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Similar documents
Chapter2 Description of samples and populations. 2.1 Introduction.

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Description of Samples and Populations

Units. Exploratory Data Analysis. Variables. Student Data

Example 2. Given the data below, complete the chart:

CHAPTER 2 Description of Samples and Populations

Chapter 2: Tools for Exploring Univariate Data

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Section 3. Measures of Variation

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Describing distributions with numbers

P8130: Biostatistical Methods I

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Practice problems from chapters 2 and 3

Descriptive Univariate Statistics and Bivariate Correlation

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

MATH 1150 Chapter 2 Notation and Terminology

Chapter 3. Data Description

Elementary Statistics

Chapters 1 & 2 Exam Review

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Introduction to Statistics

Sections 2.3 and 2.4

Statistics I Chapter 2: Univariate data analysis

Describing distributions with numbers

Full file at

STAT 200 Chapter 1 Looking at Data - Distributions

Learning Objectives for Stat 225

Biostatistics for biomedical profession. BIMM34 Karin Källen & Linda Hartman November-December 2015

MATH 117 Statistical Methods for Management I Chapter Three

Statistics I Chapter 2: Univariate data analysis

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Descriptive statistics

2.1 Measures of Location (P.9-11)

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 1. Looking at Data

1. Exploratory Data Analysis

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

CIVL 7012/8012. Collection and Analysis of Information

Chapter 3. Measuring data

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Chapter 2 Solutions Page 15 of 28

Lecture 1: Description of Data. Readings: Sections 1.2,

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

A is one of the categories into which qualitative data can be classified.

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Probability Distributions

Unit 2. Describing Data: Numerical

2011 Pearson Education, Inc

BNG 495 Capstone Design. Descriptive Statistics

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Introduction to Statistics

Describing Distributions with Numbers

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Descriptive Statistics

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

Lecture 11. Data Description Estimation

TOPIC: Descriptive Statistics Single Variable

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Review: Central Measures

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Statistics in medicine

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Contents. Acknowledgments. xix

Measures of the Location of the Data

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

The empirical ( ) rule

Resistant Measure - A statistic that is not affected very much by extreme observations.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Clinical Research Module: Biostatistics

Descriptive Data Summarization

Exploratory data analysis: numerical summaries

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

2 Descriptive Statistics

Quantitative Tools for Research

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Math 14 Lecture Notes Ch Percentile

Lecture 2 and Lecture 3

3.1 Measure of Center

Describing Distributions

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

1 Descriptive Statistics

Exercises from Chapter 3, Section 1

The Normal Distribution. Chapter 6

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Transcription:

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables Random Variables (RVs) are discrete quantitative continuous nominal qualitative ordinal Notation and Definitions: a Sample is a collection of objects a Sample is of size n ; Population size is N we measure a response variable (denoted Y) Some graphical displays Frequency Distributions ( 2.2) Bar charts for qualitative variables (see pp.28 29) Frequency histograms for quantitative variables (see below) Sow Litter Size (discrete; p. 31) Histogram of Number of Piglets 9 8 7 Frequency 6 5 4 3 2 1 6 8 1 Number of Piglets 12 14 1 Page

A frequency histogram for a continuous variable is the Serum CK example on p.33; note the right skew in this distribution. The sample mean for these data is 98.3 and the sample median is 94.5; indeed, the mean is to the right of the median. This will always be the case. And the reverse is true when the distribution is skewed to the left. 2.3 Measuring the Center (Mean, Median, Mode) Lamb weight data (p.4): y 1 = 11, y 2 = 13, y 3 = 19, y 4 = 2, y 5 = 1, y 6 = 1 The sample average or sample mean is The sample median is the 5 th percentile or center value, and the sample mode is the most common value. Sample measures such as the mean, median and mode are called sample statistics. For the lamb weight data, the sample mean is. To find the median, we order the values: 1 2 1 11 13 19 and we average the center two values (since n = 6 is even); thus, the median is 1.5 pounds. 2 Page

We will use the convention given in the main text on pp.45 46 to find the median as well as the first quartile (Q 1 ) and the third quartile (Q 3 ). Then, the interquartile range is IQR = Q 3 Q 1. Also, Boxplots ( 2.4 on pp.45 51)) graphically show the minimum and maximum values as well as Q 1, the median (Q 2 ), and Q 3. See text p.48 for an example. On pp. 48 49, the issue of Outliers is raised: note that this is a somewhat controversial subject. Read 2.5 on relationships between variables (categoricalcategorical, numeric categorical, and numeric numeric). 2.6 Measuring Variability or Dispersion One such measure is the range another is the IQR (which is more robust). But more common/useful is the sample standard deviation ( SD or s ), which is the positive square root of the sample variance (s 2 ). The sample variance is: We call the component the deviation associated with the observation y k. To illustrate, for the lamb data ( y 9. 333 ): 11 13 19 2 1 1 the deviations are 1.667 3.667 9.667 7.333.667 8.333 (these always must sum to zero), so that the sample variance is 3 Page

......... and the sample SD is s = 6.8313 pounds. Further, the coefficient of variation (useful in ecology, agriculture, and medicine) is %.. %. % In practice, to find the sample variance, we usually use the shortcut formula: To illustrate, for the Serum CK values on p.49 bottom, n = 36, y k = 3,538, y = 98.2778, and yk 2 = 44,784, so that,. Old Quiz Exercise: Ten women were asked how many hours per week they exercise. Their answers were as follows: 5 13 3 6 14 3 1 3 8 4 Calculate the coefficient of variation for these data. Hint: the sum of these data is 6 and the sum of their squares is 534. [Answer is (4.397/6)*1% = 73.3%] 4 Page

An important application of the above is the Empirical Rule (p.46), which states that for a nicely shaped distribution about 68% of the sample values will lie in the interval from to within one SD of the mean about 95% of the sample values will lie in the interval from to within two SD s of the mean > 99% of the sample values will lie in the interval from to within three SD s of the mean To illustrate, for the Serum CK values = 98.278 and s = 4.383, so the Empirical Rule says that approximately 68% of the sample values will lie in the interval (57.9, 138.66); approximately 95% of the sample values will lie in the interval (17.51, 179.4); and over 99% of the sample values will lie in the interval ( 22.87, 219.43) By returning to the actual data on p.68 and counting, we see that the actual percentages in these respective intervals are 26 / 36 = 72.2% 34 / 36 = 94.4% 36 / 36 = 1% Note that these values are indeed close to those predicted by the Empirical Rule. Read on p.65 how you can estimate the mean and SD from a histogram; note Example 2.6.1 on p.66. 5 Page

2.7: Sometimes, obtaining a transformation of a RV (Y) is helpful for example to achieve a more symmetric distribution of the transformed variable (Y'). A linear transformation is used to convert degrees Celsius to degrees Fahrenheit; taking a square root or a logarithm involves a nonlinear transformation. The Cricket Singing Time (Y) data of p.72 can be transformed using the natural logarithm transformation [ = log(y)]; see the following plots. Histogram of time, log-time 25 tim e 2-1 1 log-tim e 2 3 2 15 Frequency 15 1 1 5 5 6 12 18 24 2.8: As mentioned, the following diagram is helpful to visualize an important application of statistical science called Statistical Inference. A representative Sample is taken from the Population (so as to remove any Bias), and the sample Statistic (such as sample mean) is used to estimate the population Parameter. 6 Page

For example, a selected group of n = 25 Loyola undergraduates had an average age of y = 19.4 and pˆ = 64.% were Female. Making statements about all Loyola undergraduates falls under the heading of statistical inference. It s important to consider if the sample is representative and consider whether the sampling methodology may result in any bias. Sometimes defining the population can be challenging. For example, if the sample was chosen at random by knocking on doors in the student dormitories, these issues may be very important. Note that a sample characteristic is called a statistic and a population characteristic is called a parameter. Some examples include the following (see p.79): Characteristic Sample Statistic Population Parameter Mean SD (standard deviation) s Proportion p 7 Page