The normal distribution

Similar documents
The normal distribution

Descriptive statistics

The Central Limit Theorem

The Central Limit Theorem

Chapter 3: The Normal Distributions

Reminders. Homework due tomorrow Quiz tomorrow

The Normal Distribution. MDM4U Unit 6 Lesson 2

Power and sample size calculations

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Inference for Single Proportions and Means T.Scofield

EQ: What is a normal distribution?

Continuous random variables

Statistics 528: Homework 2 Solutions

Z-tables. January 12, This tutorial covers how to find areas under normal distributions using a z-table.

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Math 147 Lecture Notes: Lecture 12

The Standard Deviation as a Ruler and the Normal Model

Chapter 3. Measuring data

Density Curves and the Normal Distributions. Histogram: 10 groups

Are data normally normally distributed?

11. The Normal distributions

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

The Normal Table: Read it, Use it

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

Section 3.4 Normal Distribution MDM4U Jensen

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

STA 218: Statistics for Management

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

MATH 1150 Chapter 2 Notation and Terminology

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Section 5.4. Ken Ueda

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

(i) The mean and mode both equal the median; that is, the average value and the most likely value are both in the middle of the distribution.

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

The Normal Distribution. Chapter 6

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Units. Exploratory Data Analysis. Variables. Student Data

BIOS 2041: Introduction to Statistical Methods

Standard Normal Calculations

Learning Targets: Standard Form: Quadratic Function. Parabola. Vertex Max/Min. x-coordinate of vertex Axis of symmetry. y-intercept.

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

The Normal Distribution

Estimating a population mean

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Describing distributions with numbers

STA 218: Statistics for Management

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Study and research skills 2009 Duncan Golicher. and Adrian Newton. Last draft 11/24/2008

Sampling, Frequency Distributions, and Graphs (12.1)

Exam #2 Results (as percentages)

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

7 Estimation. 7.1 Population and Sample (P.91-92)

How spread out is the data? Are all the numbers fairly close to General Education Statistics

Final Exam - Solutions

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Chapter 2: Tools for Exploring Univariate Data

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Logarithmic Functions. 4. f(f -1 (x)) = x and f -1 (f(x)) = x. 5. The graph of f -1 is the reflection of the graph of f about the line y = x.

The empirical ( ) rule

CHAPTER 7 THE SAMPLING DISTRIBUTION OF THE MEAN. 7.1 Sampling Error; The need for Sampling Distributions

Statistical Inference

Recall that the standard deviation σ of a numerical data set is given by

Count data and the Poisson distribution

Elementary Statistics

Multiple samples: Modeling and ANOVA

CHAPTER 1. Introduction

23.3. Sampling Distributions. Engage Sampling Distributions. Learning Objective. Math Processes and Practices. Language Objective

Statistical Concepts. Constructing a Trend Plot

MODULE 9 NORMAL DISTRIBUTION

GRE Quantitative Reasoning Practice Questions

Princeton University Press, all rights reserved. Chapter 10: Dynamics of Class-Structured Populations

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

CHAPTER 5 Probabilistic Features of the Distributions of Certain Sample Statistics

value of the sum standard units

6 THE NORMAL DISTRIBUTION

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Definition: Quadratic equation: A quadratic equation is an equation that could be written in the form ax 2 + bx + c = 0 where a is not zero.

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

Biostatistics in Dentistry

Two-sample Categorical data: Testing

Exponential, Gamma and Normal Distribuions

Section 3. Measures of Variation

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

One-sample categorical data: approximate inference

STAT 155 Introductory Statistics. Lecture 6: The Normal Distributions (II)

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

Lesson #33 Solving Incomplete Quadratics

Using the z-table: Given an Area, Find z ID1050 Quantitative & Qualitative Reasoning

Central Limit Theorem for Averages

Data Analysis and Statistical Methods Statistics 651

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Stat 20 Midterm 1 Review

Section 3.2 Measures of Central Tendency

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Transcription:

The normal distribution Patrick Breheny March 3 Patrick Breheny to Biostatistics (BIOS 4120) 1/25

A common histogram shape Histograms of infant mortality rates, heights, and cholesterol levels: Africa NHANES (adult women) NHANES (adult women) 12 700 Frequency 10 8 6 4 Frequency 600 400 200 Frequency 600 500 400 300 200 2 100 0 0 0 0 50 100 150 200 Infant mortality rate 55 60 65 70 Height (inches) 50 100 150 200 250 LDL cholesterol What do these histograms have in common? Patrick Breheny to Biostatistics (BIOS 4120) 2/25

The normal curve Mathematicians discovered long ago that the equation y = 1 2π e 2 /2 described the histograms of many random variables y Patrick Breheny to Biostatistics (BIOS 4120) 3/25

Features of the normal curve The normal curve is symmetric around = 0 The normal curve is always positive The normal curve drops rapidly down near zero as moves away from 0 Patrick Breheny to Biostatistics (BIOS 4120) 4/25

The normal curve in action Africa NHANES (adult women) NHANES (adult women) Infant mortality rate (standard units) Height (standard units) HDL cholesterol (standard units) Technical note: The data has been standardized and the vertical ais is now density Data whose histogram looks like the normal curve are said to be normally distributed or to follow a normal distribution Patrick Breheny to Biostatistics (BIOS 4120) 5/25

Probabilities from the normal curve Probabilities are given by the area under the normal curve: Patrick Breheny to Biostatistics (BIOS 4120) 6/25

The 68%/95% rule This is where the 68%/95% rule of thumb that we discussed earlier comes from: P=68% P=95% P=100% Patrick Breheny to Biostatistics (BIOS 4120) 7/25

Calculating probabilities By knowing that the total area under the normal curve is 1, we can get a rough idea of the area under a curve by looking at a plot However, to get eact numbers, we will need a computer How much area is under this normal curve? is an etremely common question in statistics, and programmers have developed algorithms to answer this question very quickly: The output from these algorithms is commonly collected into tables, which is what you will have to use for eams Patrick Breheny to Biostatistics (BIOS 4120) 8/25

Calculating the area under a normal curve, eample 1 Find the area under the normal curve between 0 and 1.84.5 =.34 > pnorm(1) - pnorm(0) [1] 413447 Patrick Breheny to Biostatistics (BIOS 4120) 9/25

Calculating the area under a normal curve, eample 2 Find the area under the normal curve above 1 1.84 =.16 > 1-pnorm(1) [1] 586553 Patrick Breheny to Biostatistics (BIOS 4120) 10/25

Calculating the area under a normal curve, eample 3 Find the area under the normal curve that lies outside -1 and 1 1 - (.84-.16) =.32 Alternatively, we could have used symmetry: 2(.16)=.32 > 2*pnorm(-1) [1] 173105 Patrick Breheny to Biostatistics (BIOS 4120) 11/25

Calculating percentiles A related question of interest is, What is the th percentile of the normal curve? This is the opposite of the earlier question: instead of being given a value and asked to find the area to the left of the value, now we are told the area to the left and asked to find the value With a table, we can perform this inverse search by finding the probability in the body of the table, then looking to the margins to find the percentile associated with it Patrick Breheny to Biostatistics (BIOS 4120) 12/25

Calculating percentiles (cont d) What is the 60th percentile of the normal curve? There is no.600 in the table, but there is a.599, which corresponds to 5 The real 60th percentile must lie between 5 and 6 (it s actually 533) For this class, 5, 6, or anything in between is an acceptable answer Or we could obtain an eact answer from R: > qnorm(0.6) [1] 533471 How about the 10th percentile? The 10th percentile is -1.28 > qnorm() [1] -1.281552 Patrick Breheny to Biostatistics (BIOS 4120) 13/25

Calculating values such that a certain area lies within/outside them Find the number such that the area outside and is equal to 10% Our answer is therefore ±1.645 (the 5th/95th percentile) Patrick Breheny to Biostatistics (BIOS 4120) 14/25

Reconstructing a histogram In week 2, we said that the mean and standard deviation provide a two-number summary of a histogram We can now make this observation a little more concrete Anything we could have learned from a histogram, we will now determine by approimating the real distribution of the data by the normal distribution This approach is called the normal approimation Patrick Breheny to Biostatistics (BIOS 4120) 15/25

NHANES adult women The data set we will work with on these eamples is the NHANES sample of the heights of 2,649 adult women The mean height is 63.5 inches The standard deviation of height is 2.75 inches Patrick Breheny to Biostatistics (BIOS 4120) 16/25

Procedure: Probabilities using the normal curve The procedure for calculating probabilities with the normal approimation is as follows: #1 Draw a picture of the normal curve and shade in the appropriate probability #2 Convert to standard units: letting denote a number in the original units and z a number in standard units, z = SD where is the mean and SD is the standard deviation #3 Determine the area under the normal curve using a table or computer Patrick Breheny to Biostatistics (BIOS 4120) 17/25

Estimating probabilities: Eample # 1 Suppose we want to estimate the percent of women who are under 5 feet tall 5 feet, or 60 inches, is 3.5/2.75=1.27 standard deviations below the mean Using the normal distribution, the probability of more than 1.27 standard deviations below the mean is P ( < 1.27) = 1% In the actual sample, 282 out of 2,649 women were under 5 feet tall, which comes out to 10.6% Patrick Breheny to Biostatistics (BIOS 4120) 18/25

Estimating probabilities: Eample # 2 Another eample: suppose we want to estimate the percent of women who are between 5 3 and 5 6 (63 and 66 inches) These heights are 8 standard deviations below the mean and 0.91 standard deviations above the mean, respectively Using the normal distribution, the probability of falling in this region is 39.0% In the actual data set, 1,029 out of 2,649 women were between 5 3 and 5 6: 38.8% Patrick Breheny to Biostatistics (BIOS 4120) 19/25

Procedure: Percentiles using the normal curve We can also use the normal distribution to approimate percentiles The procedure for calculating percentiles with the normal approimation is as follows: #1 Draw a picture of the normal curve and shade in the appropriate area under the curve #2 Determine the percentiles of the normal curve corresponding to the shaded region using a table or computer #3 Convert from standard units back to the original units: = + z(sd) where, again, is in original units, z is in standard units, is the mean, and SD is the standard deviation Patrick Breheny to Biostatistics (BIOS 4120) 20/25

Approimating percentiles: Eample Suppose instead that we wished to find the 75th percentile of these women s heights For the normal distribution, 0.67 is the 75th percentile The mean plus 0.67 standard deviations in height is 65.35 inches For the actual data, the 75th percentile is 65.39 inches Patrick Breheny to Biostatistics (BIOS 4120) 21/25

The broad applicability of the normal approimation These eamples are by no means special: the distribution of many random variables are very closely approimated by the normal distribution Indeed, this is why statisticians call it the normal distribution Other names for the normal distribution include the Gaussian distribution (after its inventor) and the bell curve (after its shape) For variables with approimately normal distributions, the mean and standard deviation essentially tell us everything about the data other summary statistics and graphics are redundant Patrick Breheny to Biostatistics (BIOS 4120) 22/25

Caution The normal curve Other variables, however, are not approimated by the normal distribution well, and give misleading or nonsensical results when you apply the normal approimation to them For eample, the value 0 lies 1.63 standard deviations below the mean infant mortality rate for Europe The normal approimation therefore predicts a probability that 5.1% of the countries in Europe will have negative infant mortality rates Patrick Breheny to Biostatistics (BIOS 4120) 23/25

Caution (cont d) As another eample, the normal distribution will always predict the median to lie 0 standard deviations above the mean i.e., it will always predict that the median equals the mean As we have seen, however, the mean and median can differ greatly when distributions are skewed For eample, according to the U.S. census bureau, the mean income in the United States is $66,570, while the median income is $48,201 Patrick Breheny to Biostatistics (BIOS 4120) 24/25

The normal curve The distribution of many random variables are very closely approimated by the normal distribution Know how to calculate the area under the normal curve Know how to determine percentiles of the normal curve Know how to approimate probabilities for real data using the normal approimation Know how to approimate quantiles for real data using the normal approimation Patrick Breheny to Biostatistics (BIOS 4120) 25/25