a table or a graph or an equation.

Similar documents
Unit 4 Probability. Dr Mahmoud Alhussami

9/19/2012. PSY 511: Advanced Statistics for Psychological and Behavioral Research 1

Section 3.4 Normal Distribution MDM4U Jensen

Chapter 4a Probability Models

STAT 200 Chapter 1 Looking at Data - Distributions

STT 315 This lecture is based on Chapter 2 of the textbook.

A is one of the categories into which qualitative data can be classified.

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Elementary Statistics

4.2 The Normal Distribution. that is, a graph of the measurement looks like the familiar symmetrical, bell-shaped

Chapter 6 The Normal Distribution

Learning Objectives for Stat 225

MODULE 9 NORMAL DISTRIBUTION

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Business Statistics. Lecture 10: Course Review

Essential Statistics Chapter 6

Chapter 6 Group Activity - SOLUTIONS

MATH 1150 Chapter 2 Notation and Terminology

BNG 495 Capstone Design. Descriptive Statistics

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Descriptive Univariate Statistics and Bivariate Correlation

Chapter 5. Understanding and Comparing. Distributions

The Empirical Rule, z-scores, and the Rare Event Approach

Chapter 3. Data Description

Chapter 2 Solutions Page 15 of 28

Probability and Samples. Sampling. Point Estimates

Descriptive Data Summarization

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 4. Displaying and Summarizing. Quantitative Data

Stat 101 Exam 1 Important Formulas and Concepts 1

Example 2. Given the data below, complete the chart:

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Lecture 10/Chapter 8 Bell-Shaped Curves & Other Shapes. From a Histogram to a Frequency Curve Standard Score Using Normal Table Empirical Rule

(a) (i) Use StatCrunch to simulate 1000 random samples of size n = 10 from this population.

Chapter 1. Looking at Data

Review of Statistics 101

1 Probability Distributions

Review of the Normal Distribution

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

SUMMARIZING MEASURED DATA. Gaia Maselli

Lecture 1: Description of Data. Readings: Sections 1.2,

Mean/Average Median Mode Range

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Introduction to Statistics

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Chapter 8 Sampling Distributions Defn Defn

CS 361: Probability & Statistics

STAT/SOC/CSSS 221 Statistical Concepts and Methods for the Social Sciences. Random Variables

20 Hypothesis Testing, Part I

are the objects described by a set of data. They may be people, animals or things.

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

IV. The Normal Distribution

Week 1: Intro to R and EDA

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Descriptive statistics

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Bemidji Area Schools Outcomes in Mathematics Algebra 2 Applications. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 7

The Normal Distribution. Chapter 6

Section 5.4. Ken Ueda

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Describing distributions with numbers

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

TOPIC: Descriptive Statistics Single Variable

Chapter 2: Tools for Exploring Univariate Data

Units. Exploratory Data Analysis. Variables. Student Data

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Continuous Probability Distributions

CS 147: Computer Systems Performance Analysis

Sets and Set notation. Algebra 2 Unit 8 Notes

The empirical ( ) rule

IV. The Normal Distribution

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

200 participants [EUR] ( =60) 200 = 30% i.e. nearly a third of the phone bills are greater than 75 EUR

STATISTICS/MATH /1760 SHANNON MYERS

Introduction to Basic Statistics Version 2

Section 7.1 Properties of the Normal Distribution

II. The Normal Distribution

Math Literacy. Curriculum (457 topics)

Do students sleep the recommended 8 hours a night on average?

(a) Calculate the bee s mean final position on the hexagon, and clearly label this position on the figure below. Show all work.

Chapter 3. Measuring data

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Continuous Probability Distributions

Sampling, Frequency Distributions, and Graphs (12.1)

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

AP Final Review II Exploring Data (20% 30%)

In this investigation you will use the statistics skills that you learned the to display and analyze a cup of peanut M&Ms.

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

Chapter 3 Probability Distributions and Statistics Section 3.1 Random Variables and Histograms

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Lecture 1: Descriptive Statistics

Data Analysis and Statistical Methods Statistics 651

Probability and Statistics

Transcription:

Topic (8) POPULATION DISTRIBUTIONS 8-1 So far: Topic (8) POPULATION DISTRIBUTIONS We ve seen some ways to summarize a set of data, including numerical summaries. We ve heard a little about how to sample a population effectively in order to get good estimates of the population quantities of interest (e.g. taking a good sample and calculating the sample mean as a way of estimating the true but unknown population mean value) We ve talked about the ideas of probability and independence. Now we need to start putting all this together in order to do Statistical Inference, the methods of analyzing data and interpreting the results of those analyses with respect to the population(s) of interest. The Probability Distribution for a random variable can be a table or a graph or an equation.

Topic (8) POPULATION DISTRIBUTIONS 8-2 Let s start by reviewing the ideas of frequency distributions for populations using categorical variables. QUALITATIVE (NON-NUMERIC) VARIABLES For a random variable that takes on values of categories, the Probability distribution is a table showing the likelihood of each value. EXAMPLE Tree species found in a boreal forest. For each possible species there would a probability associated with it. E.g. suppose there are 4 species and three are very rare and one is very common. A probability table might look like: Species Probability 1 0.01 2 0.03 3 0.08 4 0.88 All 1.00 We interpret these values as the probability that a random selection would result in observing that species. We could also draw a bar chart but it would be fairly noninformative in this instance since one value is so much larger than the others! An equation cannot be developed since the values that the variable takes on are non-numeric.

Topic (8) POPULATION DISTRIBUTIONS 8-3 QUANTITATIVE (NUMERICAL) VARIABLES A) Discrete Random Variables Recall that a discrete random variable is one that takes on values only from a set of isolated (specific) numbers. The relative frequency distribution for a discrete random variable (also sometimes called a probability mass function) is a list of probabilities for each possible value that the variable can take on. BERNOULLI DISTRIBUTION Suppose the scientist studying the tree species overlaid a grid of square quadrats over the region of interest and then recorded whether any tree was in the quadrat or not. Hence, the random variable is binary, i.e. only two outcomes presence (1) or absence (0). The Bernoulli distribution describes the probability of each outcome: Pr(X=1) = π Pr(X=0) = 1 π The mean for a Bernoulli variable is π and the variance is π(1-π).

Topic (8) POPULATION DISTRIBUTIONS 8-4 POISSON DISTRIBUTION Suppose the scientist studying the tree species overlaid a grid of square quadrats over the region of interest and then counted the number of hickory trees in each quadrat. The histogram of the number of trees per quadrat for all of the quadrats might look like Tree Count 0.40 0.30 0.20 0.10 0 1 2 3 4 5 6 7 8 9 10 11 Quantiles Moments maximum 100.0% 99.5% 97.5% 11.000 8.000 7.000 Mean Std Dev Std Error Mean 2.999 1.750 0.025 quartile 90.0% 75.0% 5.000 4.000 Upper 95% Mean Lower 95% Mean 3.048 2.951 median quartile 50.0% 25.0% 10.0% 3.000 2.000 1.000 N Sum Weights 5000.000 5000.000 minimum 2.5% 0.5% 0.0% 0.000 0.000 0.000

Topic (8) POPULATION DISTRIBUTIONS 8-5 Since we have sampled the entire population (the set of counts for every quadrat in the region), this histogram represents the probability distribution of the random variable X = number of trees/quadrat. In general, the Poisson distribution is a common probability distribution for counts per unit time or unit area or unit volume. The graph can also be described using an equation known as the Poisson Distribution Probability Mass Function. It gives the probability of observing a specific count (x) in any randomly selected quadrat as Pr( X = x) = e µ µ x! x where x! = x( x 1)( x 2)...(3)(2)(1) and x = 0,1, 2,.... In order for this distribution to be a valid probability distribution, we require that the total probability for all possible values equal 1 and that every possible value have a probability associated with it. Pr( X X = 0,1,2,... = x) = X = 0,1,2,... e µ µ x! x = 1

Topic (8) POPULATION DISTRIBUTIONS 8-6 µ e µ and Pr( X = x) = 0 x! The mean of the Poisson distribution is µ and the variance is µ as well. DISCRETE UNIFORM DISTRIBUTION: every discrete value that the random variable can take on has the same probability of occurring. For example, suppose a researcher is interested in whether the number of setae on the first antennae of an insect is random or not. Further, the researcher believes that there must be at least 1 seta and at most 8. Then s/he is postulating that every value between 1 and 8 are equally likely to be observed in a random draw of an insect from the population (or equivalently, that there are equal numbers of insects with 1, 2,, or 8 setae in the population). Such a distribution is known as the Discrete Uniform Distribution. Let K be the total number of distinct values that the random variable can take on (e.g. the set {1, 2,, 8} contains K = 8 distinct values). Then, 1 Pr( X = x) = for x = 1, 2,, 8 K x

Topic (8) POPULATION DISTRIBUTIONS 8-7 In addition, the mean for this particular discrete uniform is 36 = x µ = = 4.5 K 8 and the variance is 2 ( 4.5) = x σ = 5.25. K 2 Also, it is easy to see that the probabilities sum to 1 as required. Finally, the graph of the distribution looks like a rectangle: 0 2 4 6 8

Topic (8) POPULATION DISTRIBUTIONS 8-8 B) Continuous Random Variables Recall that a continuous random variable is one that can take on any value from an interval on the number line. Now, for relative frequency distributions: Fact 1: They show the frequencies of the values of the variable of interest in a set of data: 50 40 30 20 10 Std. Dev = 12.80 Mean = 71.0 0 N = 222.00 40.0 50.0 60.0 70.0 80.0 90.0 45.0 55.0 65.0 75.0 85.0 95.0 TIME where the data have been assigned to specific groupings (bins or categories). The height of each bar is proportional to the relative frequency in the data set of the group it represents. Multiplying the heights by the widths of the bars and adding all the areas gives the total area under in the bars (red). The area under any one bar divided by the total area equals Pr(an observation falls in that grouping)

Topic (8) POPULATION DISTRIBUTIONS 8-9 Fact 2: For a continuous variable and an extremely large population, the number of bars is very large and the heights of the bars approach a smooth curve. This curve is often referred to as a DENSITY CURVE or the probability distribution. The curve describes the shape of the distribution and also depends on the mean and standard deviation of the population under study.

Topic (8) POPULATION DISTRIBUTIONS 8-10 Fact 3: When the curve is describing frequency distribution of the population, every observation must fall within the limits of the distribution. Hence, 100% of the observations are listed.

Topic (8) POPULATION DISTRIBUTIONS 8-11 When we combine these three facts, we get that the density curve describing the frequency distribution of values of a quantitative variable 1) has a total area under the curve of 1 (analogous to 100%) and 2) the area over a range of values equals the relative frequency of that range in the population, i.e. the area equals the probability of observing a value within that range Area in between these two lines is the probability that X falls between the values of 5 and 8. 5 8 There are many standard (common) density curves:

Topic (8) POPULATION DISTRIBUTIONS 8-12 UNIFORM DISTRIBUTION every subset interval of the same length is equal likely. For example, suppose we randomly selected a number from the number line [0, 10]. Then the Probability distribution is given by Pr( a < X < b) = b U a L for X [ L, U] and L, U > 0. Uniform 0 1 2 3 4 5 6 7 8 9 10 e.g. Pr(3<X<4) = The mean of a Uniform distribution is variance is U L = 2 µ and the

Topic (8) POPULATION DISTRIBUTIONS 8-13 NORMAL DISTRIBUTION (Bell-Curve or Gaussian Distribution) symmetric, unimodal and bell-shaped Some interesting facts about the NORMAL DISTRIBUTION: 1. mean = median = mode 2. the shape is perfectly symmetric with equal sized tails 3. the Empirical Rule has an exact form: 68.26 % of the values fall within µ ± σ 95.44% of the values fall within µ ± 2σ 99.74% of the values fall within µ ± 3σ 4. the endpoints of the interval µ ± σ fall exactly at the inflection points of the curve 5. it s the most common distribution for natural phenomena that take on continuous values

Topic (8) POPULATION DISTRIBUTIONS 8-14 Calculating Probabilities Of Events For A Normal Distribution: EXAMPLE IQ as measured by the Stanford-Binet test has a mean of µ=100 and a standard deviation of σ=15. 1. What proportion of the US adult population has an IQ above 100? i.e. find Pr(IQ>100). 2. What proportion of the population has an IQ between 85 and 115? i.e. find Pr(85<IQ<115).

Topic (8) POPULATION DISTRIBUTIONS 8-15 Question: What do we do when the value of interest in the probability phrase does NOT fall exactly at the standard deviation cutoffs? E.g. find Pr(IQ<110)? Answer: Convert the value to a Z-score and use it and a look up table (or a computer program) to calculate the probability. Recall the Z-SCORE for a value is the number of standard deviations that value is from the mean: * x µ Z score = z = σ e.g. IQ of 110 z * 110 µ 110 100 = = = 0667. σ 15

Topic (8) POPULATION DISTRIBUTIONS 8-16 Defn: When X is normally distributed, the Z-score has a STANDARD NORMAL DISTRIBUTION. The Standard normal distribution is a normal distribution with a mean of µ=0 and a standard deviation of σ=1. µ 3σ µ 1σ µ+1σ µ+3σ µ 2σ µ µ+2σ Original IQ score 55 70 85 100 115 130 145 Equivalent Z-score -3-2 -1 0 +1 +2 +3

Topic (8) POPULATION DISTRIBUTIONS 8-17 So, the important point here is that we need to do the conversion X µ a µ Pr( X < a) = Pr < = Pr( Z < σ σ z) in order to find probabilities of events under a normal distribution e.g. IQ µ 110 µ Pr( IQ < 110) = Pr < σ σ IQ 100 110 100 = Pr < = Pr( Z < 15 15 0.667) Next, look up the area (i.e. Probability) on a table: Pr( Z < 0.667) = 0.7486, so approximately 75% of the population has an IQ less than 110.

Topic (8) POPULATION DISTRIBUTIONS 8-18

Topic (8) POPULATION DISTRIBUTIONS 8-19

Topic (8) POPULATION DISTRIBUTIONS 8-20 Some practice which also uses the rules for Probability that we learned earlier: 1. Find Pr(IQ>92) 2. Find Pr(70<IQ<120).

Topic (8) POPULATION DISTRIBUTIONS 8-21 Finding Quantiles for the Normal Distribution Most often used to find extreme values in the very highest (or lowest) percentages EXAMPLE Suppose adult male heights are normally distributed with a mean of 69 and a standard deviation of 3.5. We have learned how to answer questions like: What proportion of the population are taller than 6 (72 )? How do we answer a question like: Find the range of likely heights for the shortest 5% of the male population, i.e. what height is the 5 th percentile of the population? Here we are being asked to find the value of a that makes the following probability statement true: We know that So we ll start by solving Pr(Height < a) = 0.05 Pr(Height < a) = Pr(Z < z * ) Pr(Z < z * )=0.05

Topic (8) POPULATION DISTRIBUTIONS 8-22 for z *. a Now, we ll use the fact that z * = µ and our σ knowledge of the values of µ and σ to solve for a.