We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Similar documents
We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Descriptive Statistics (And a little bit on rounding and significant digits)

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Chapter 18. Sampling Distribution Models /51

Solving with Absolute Value

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

Sampling Distribution Models. Chapter 17

Chapter 1 Review of Equations and Inequalities

Confidence intervals

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

Lesson 6: Algebra. Chapter 2, Video 1: "Variables"

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 27 Summary Inferences for Regression

MITOCW ocw f99-lec01_300k

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Probability and Statistics

Sampling Distribution Models. Central Limit Theorem

Astronomy 102 Math Review

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

appstats27.notebook April 06, 2017

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

MITOCW ocw f99-lec30_300k

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

Discrete Probability. Chemistry & Physics. Medicine

The Inductive Proof Template

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Logarithms and Exponentials

MITOCW MIT18_01SCF10Rec_24_300k

Module 3 Study Guide. GCF Method: Notice that a polynomial like 2x 2 8 xy+9 y 2 can't be factored by this method.

POLYNOMIAL EXPRESSIONS PART 1

P (A) = P (B) = P (C) = P (D) =

MITOCW R11. Double Pendulum System

Dealing with the assumption of independence between samples - introducing the paired design.

Sequences and Series

Note: Please use the actual date you accessed this material in your citation.

Are data normally normally distributed?

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

MITOCW MITRES18_005S10_DiffEqnsGrowth_300k_512kb-mp4

MATH 31B: BONUS PROBLEMS

Business Statistics. Lecture 9: Simple Regression

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Chapter 3 ALGEBRA. Overview. Algebra. 3.1 Linear Equations and Applications 3.2 More Linear Equations 3.3 Equations with Exponents. Section 3.

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

Probability and the Second Law of Thermodynamics

EXPERIMENT: REACTION TIME

Quadratic Equations Part I

- a value calculated or derived from the data.

MITOCW Investigation 3, Part 1

Conceptual Explanations: Simultaneous Equations Distance, rate, and time

Infinite Limits. Infinite Limits. Infinite Limits. Previously, we discussed the limits of rational functions with the indeterminate form 0/0.

Fitting a Straight Line to Data

But, there is always a certain amount of mystery that hangs around it. People scratch their heads and can't figure

Where on Earth are We? Projections and Coordinate Reference Systems

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

Sections 3.4 and 3.5

MITOCW ocw f08-lec19_300k

MITOCW ocw f99-lec09_300k

MITOCW ocw f99-lec17_300k

September 12, Math Analysis Ch 1 Review Solutions. #1. 8x + 10 = 4x 30 4x 4x 4x + 10 = x = x = 10.

Solving Quadratic & Higher Degree Equations

MITOCW MITRES18_006F10_26_0101_300k-mp4

In economics, the amount of a good x demanded is a function of the price of that good. In other words,

MITOCW watch?v=ruz33p1icrs

MITOCW ocw f99-lec05_300k

MITOCW MIT18_02SCF10Rec_61_300k

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

6: Polynomials and Polynomial Functions

Data Analysis and Statistical Methods Statistics 651

Lecture 12: Quality Control I: Control of Location

4/1/2012. Test 2 Covers Topics 12, 13, 16, 17, 18, 14, 19 and 20. Skipping Topics 11 and 15. Topic 12. Normal Distribution

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

THE SIMPLE PROOF OF GOLDBACH'S CONJECTURE. by Miles Mathis

Solving Quadratic & Higher Degree Equations

One-sample categorical data: approximate inference

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Uncertainty: A Reading Guide and Self-Paced Tutorial

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

MITOCW ocw f99-lec23_300k

Quantitative Understanding in Biology 1.7 Bayesian Methods

Lecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012

Men. Women. Men. Men. Women. Women

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

Originality in the Arts and Sciences: Lecture 2: Probability and Statistics

Section 3: Simple Linear Regression

HOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable

Basic Probability. Introduction

POL 681 Lecture Notes: Statistical Interactions

MITOCW watch?v=pqkyqu11eta

Transcription:

Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). (The symbol => means implies ) Other binomial examples: Pr{3 mutants in a sample of 5} Pr{8 people with blood type O in a sample of 20} In all cases Y = number of successes, and Y ~ BIN. We re interested in Pr{Y 120} where Y is IQ of a randomly chosen person. It turns out that IQ is normally distributed, and so Y ~ N(μ,σ). Other normal examples: Pr{Y y} where y is a specific blood oxygen level. Assuming blood oxygen level is normally distributed (actually, it probably isn't). Height - a classic example of a normally distributed random variable. So if Y = height => Y ~ N(μ,σ) Of course, there are many other distributions: Poisson, Uniform, etc, and If, for example, we have something we know has a Poisson distribution (e.g, the number ticks on a dog) we could write Y ~ Poi(μ). (The Poisson distribution has only one parameter, μ.) 2) But this is all review; what's the point here? Often, in statistics, we're not interested in the probability of an individual number, (the examples above). Instead, we are interested in the distribution of the parameter estimates (or sometimes functions of the parameter estimates). This sounds confusing, but here's an example: What is the distribution of the sample mean, y? (We can also ask about the distribution of s 2 or p, but we won't do much with the distributions of these in this course).

Why is this interesting? because if we know the distribution of y, we can ask question like: What is the probability that the average height of men is less than 68 inches? Or better: What is the probability that the average blood pressure with medication is lower than the average blood pressure with a placebo. We can start asking (and answering!) questions that are important in biology, medicine, and elsewhere! So, to summarize, a sampling distribution is a distribution of one of our parameter estimates, such as y, s 2 or p, etc. 3) Another way of looking at it (similar to book): Let s do an experiment many times (for example, we take a sample many times). The we ask: Each time we calculate the sample mean. So we'd have ȳ 1, ȳ 2, y 3... ȳ m. What is the distribution of these m sample means? But we also want to know: What is the population mean of all these sample means? What are the population variance and population standard deviation of these sample means? (And, for that matter, how do we estimate these using y and s 2?) As it turns out, we don t need to do m number of experiments for this to work. Instead we can use some mathematics and figure it out. The math is similar to that used for the mathematical definition of a population mean (in other words, we use expected values to calculate this). We won t go into the details here. 4) So we need to know the answers to the above questions. Without doing the math, here they are: The (theoretical or population) mean for y is given as follows: μ y = μ The (theoretical or population) standard deviation for y is given by: σ ȳ = σ n

Are you surprised by the first result? Hopefully not. It makes a certain intuitive sense that if: y estimates μ, then y of a bunch of y 's also estimates μ. The second result is more difficult to understand (it's not quite intuitive). But think about the following: If you use a bunch of y 's, you would expect them to be less spread out than the original data. In other words, you would expect the variance (or standard deviation) for y to be less than the variance (or standard deviation) for y. Dividing by n lets us do that. (The y 's are closer to each other than the y's). And remember, that we could do some math and show that this is actually the case. Here's a graphical representation of what's going on:

All of this now leads us directly into the next question. So what about the actual distribution of Y? If Y is normal, Y is normal (again, hopefully no surprise). If Y is not normal, then if n is large enough, Y is still approximately normal! This answer is due to the Central Limit Theorem (CLT). This is one of the reasons that the normal distribution is so important to statistics. (There are some theoretical exceptions to this theorem, but in practice they're usually not that important). In other words: Take almost any distribution at all. Take the average, ȳ. The average, y will be normally distributed if n is large enough. (What is a large enough n? We'll answer that question later when we start looking at the t- distribution.) So, to summarize, what does all this mean? That we can use our normal tables to calculate probabilities associated with Y. For example, we can answer questions like: Pr{ Ȳ < 68 inches} or Pr{ Ȳ 1 < Ȳ 2 } where Ȳ 1 = average blood pressure on medication and Ȳ 2 = average blood pressure on placebo (In practice, it turns out that we'll usually wind up using t-tables (i.e., the t-distribution) since our n will often not be large enough, but more on that next time). Questions like these are much more interesting than the ones we've been asking until now. This will lead us directly into confidence intervals and then (finally!) statistical hypothesis tests. These concepts are really important, so if you didn't follow everything in class, you might want to reread the notes and ask questions.

5) Enough theory. Let's see how to use sampling distributions. Let's look at human heights (we're trying to do something intuitive). If we look at the height of women, the average height is about 64 inches, and the standard deviation is about 3 inches. If we assume that these represent μ and σ (we don't really know these), then we have: μ = 64 inches and σ = 3 inches We know how to calculate simple probabilities, so let's calculate the probability that a single woman is more than 6 feet (= 72 inches) tall. Since height is normally distributed, we can simply do: Pr{Y > 72} = Pr{Z > 2.67} = 0.0038 We have a small but real probability. Now what happens if we randomly choose 5 women and want to know the probability that their average height is more than 6 feet? We want: Pr{ Y > 72} We can calculate z, but this time we need to be a little careful in our denominator. We need to do: z = ȳ μ ȳ σ ȳ = ȳ μ ȳ σ / n In other words, we need to use μ and σ for y. What is the (population) mean of Y for our five women? What is the population standard deviation? μ ȳ = μ = 64 inches σ ȳ = σ n = 3 = 1.342 inches 5 So now we can calculate our probability as follows: z = ȳ μ ȳ σ ȳ = 72 64 1.342 = 5.96 so we have Pr {Ȳ >72}= Pr {Z >5.96} = 1.26119e-09 Notice the absurdly small value for the probability (we had to use R to get it). This is because it is extremely (extremely!) unlikely that any sample of 5 women will have an average higher than 6 feet.

6) Standard Error (SE) You should realize that sample size can change probabilities drastically compared to just taking a simple individual (n = 1). We won't explain all of this right away, but let's start to worry about the fact that we almost never know μ and σ. So what do we do instead? How can we answer probability questions if we don't know μ and σ? We won't learn the answer to that yet, but it should be obvious that somehow we need to use our estimates ( y and s 2 ) instead of μ and σ. We estimate μ with y, and σ with s. We ve discussed this before. The fact that we have to estimate σ with s will cause us some problems later, but let's not worry about it yet. (The fact that we don't know μ or σ will change our sampling distribution.) So what happens to the standard deviation for Ȳ?? Instead of using σ we use s. In other words, we estimate: n with s n This quantity is also known as the Standard Error (in this case for y ): SE y = s n So what does the Standard Error tell us? Basically how reliable our y is as an estimate of the mean. If the SE is small, our y is a pretty good estimate; if it s big, our y is not so good an estimate. (More on this next time). Important: the Standard error is not always the same; it depends on what you're interested in: If you're interested in y, then it's defined as above. You can also have a standard error for proportions, for the differences in means, and many other quantities. We'll encounter some of these later.

Let's finish this set of notes by looking at the effect of sample size on some of the quantities we've discussed. We'll use the example for the heights of women above. Using R, here are some simulated data for various sample sizes assuming μ = 64 inches and σ = 3 inches: estimate n = 10 n = 100 n = 1000 n y 64.51 64.16 63.97 y μ s 2.60 2.91 2.96 s σ SE 0.82 0.29 0.09 SE 0 As the sample size approaches infinity, we know more and more about the population. If we had perfect knowledge (if we measured all women), then we really would know the values of μ and σ. Similarly, since the SE measures uncertainty in our y, it goes to 0 (if we know μ then y = μ, and we no longer have any uncertainty). Remember - Y is random, μ is an (unknown) constant.

Remember: μ and σ are not random variables They don't change, and are obviously not affected by sample size! y and s 2 are random variables They can change depending on what's in your sample, but usually don't change drastically with sample size. SE is also random SE is affected by sample size (it get's smaller, which is easy to see from the equation above). You should make sure you thoroughly understand this example!