We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Similar documents
We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Descriptive Statistics (And a little bit on rounding and significant digits)

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Confidence intervals

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Chapter 18. Sampling Distribution Models /51

COMP6053 lecture: Sampling and the central limit theorem. Jason Noble,

STA Why Sampling? Module 6 The Sampling Distributions. Module Objectives

COMP6053 lecture: Sampling and the central limit theorem. Markus Brede,

This is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Solving with Absolute Value

Sampling Distribution Models. Chapter 17

Confidence Intervals. - simply, an interval for which we have a certain confidence.

One sided tests. An example of a two sided alternative is what we ve been using for our two sample tests:

- a value calculated or derived from the data.

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

( )( b + c) = ab + ac, but it can also be ( )( a) = ba + ca. Let s use the distributive property on a couple of

Lesson 6: Algebra. Chapter 2, Video 1: "Variables"

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Violating the normal distribution assumption. So what do you do if the data are not normal and you still need to perform a test?

The Inductive Proof Template

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1

MITOCW ocw f99-lec01_300k

Math 5a Reading Assignments for Sections

Astronomy 102 Math Review

Chapter 1 Review of Equations and Inequalities

Business Statistics. Lecture 9: Simple Regression

Conceptual Explanations: Simultaneous Equations Distance, rate, and time

P (A) = P (B) = P (C) = P (D) =

MITOCW ocw f99-lec30_300k

Dealing with the assumption of independence between samples - introducing the paired design.

MITOCW R11. Double Pendulum System

Logarithms and Exponentials

MATH 31B: BONUS PROBLEMS

POLYNOMIAL EXPRESSIONS PART 1

The following are generally referred to as the laws or rules of exponents. x a x b = x a+b (5.1) 1 x b a (5.2) (x a ) b = x ab (5.

Lecture 12: Quality Control I: Control of Location

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

Probability and the Second Law of Thermodynamics

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Module 3 Study Guide. GCF Method: Notice that a polynomial like 2x 2 8 xy+9 y 2 can't be factored by this method.

Note: Please use the actual date you accessed this material in your citation.

STA Module 8 The Sampling Distribution of the Sample Mean. Rev.F08 1

MITOCW MIT18_01SCF10Rec_24_300k

Note: In this section, the "undoing" or "reversing" of the squaring process will be introduced. What are the square roots of 16?

Categorical Data Analysis. The data are often just counts of how many things each category has.

Mathematical Notation Math Introduction to Applied Statistics

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Chapter 5 Simplifying Formulas and Solving Equations

Discrete Probability. Chemistry & Physics. Medicine

Probability and Samples. Sampling. Point Estimates

Math 31 Lesson Plan. Day 5: Intro to Groups. Elizabeth Gillaspy. September 28, 2011

Business Statistics. Lecture 10: Course Review

Chapter 27 Summary Inferences for Regression

MITOCW Investigation 3, Part 1

MITOCW MIT18_02SCF10Rec_61_300k

Part I Electrostatics. 1: Charge and Coulomb s Law July 6, 2008

Probability and Statistics

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

Dialog on Simple Derivatives

Data Analysis and Statistical Methods Statistics 651

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

One-sample categorical data: approximate inference

MITOCW ocw f08-lec19_300k

Introduction to Survey Analysis!

MITOCW MITRES18_005S10_DiffEqnsGrowth_300k_512kb-mp4

base 2 4 The EXPONENT tells you how many times to write the base as a factor. Evaluate the following expressions in standard notation.

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

EXPERIMENT: REACTION TIME

What disease has been tamed to some degree through the help of chemical bonding theory?

u C = 1 18 (2x3 + 5) 3 + C This is the answer, which we can check by di erentiating: = (2x3 + 5) 2 (6x 2 )+0=x 2 (2x 3 + 5) 2

Sections 3.4 and 3.5

Sections 5.1 and 5.2

A Primer on Statistical Inference using Maximum Likelihood

6: Polynomials and Polynomial Functions

THE SIMPLE PROOF OF GOLDBACH'S CONJECTURE. by Miles Mathis

The First Derivative Test

Before this course is over we will see the need to split up a fraction in a couple of ways, one using multiplication and the other using addition.

Stat 20 Midterm 1 Review

Quadratic Equations Part I

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Vectors. Section 3: Using the vector product

Basic Probability. Introduction

MITOCW MITRES18_006F10_26_0101_300k-mp4

Crash Course in Statistics for Neuroscience Center Zurich University of Zurich

Chapter 3 ALGEBRA. Overview. Algebra. 3.1 Linear Equations and Applications 3.2 More Linear Equations 3.3 Equations with Exponents. Section 3.

WSMA Algebra - Expressions Lesson 14

Sampling Distribution Models. Central Limit Theorem

appstats27.notebook April 06, 2017

Probability Distributions

Math 3C Midterm 1 Study Guide

September 12, Math Analysis Ch 1 Review Solutions. #1. 8x + 10 = 4x 30 4x 4x 4x + 10 = x = x = 10.

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Transcription:

Sampling distributions and estimation. 1) A brief review of distributions: We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p). (The symbol => means implies ) Other binomial examples: Pr{3 mutants in a sample of 5} Pr{8 people with blood type O in a sample of 20} In all cases Y = number of successes, and Y ~ BIN. We re interested in Pr{Y 120} where Y is IQ of a randomly chosen person. It turns out that IQ is normally distributed, and so Y ~ N(μ,σ). Other normal examples: Pr{Y y} where y is a specific blood oxygen level. Assuming blood oxygen level is normally distributed (actually, it probably isn't). Height - a classic example of a normally distributed random variable. So if Y = height => Y ~ N(μ,σ) Of course, there are many other distributions: Poisson, Uniform, etc, and If, for example, we have something we know has a Poisson distribution (e.g, the number ticks on a dog) we could write Y ~ Poi(μ). (The Poisson distribution has only one parameter, μ.) 2) But this is all review; what's the point here? Often, in statistics, we're not interested in the probability of an individual number, (the examples above). Instead, we are interested in the distribution of the parameter estimates (or sometimes functions of the parameter estimates). This sounds confusing, but here's an example: What is the distribution of the sample mean, y? (We can also ask about the distribution of s 2 or p, but we won't do much with the distributions of these in this course).

Why is this interesting? because if we know the distribution of y, we can ask question like: What is the probability that the average height of men is less than 68 inches? Or better: What is the probability that the average blood pressure with medication is lower than the average blood pressure with a placebo. We can start asking (and answering!) questions that are important in biology, medicine, and elsewhere! So, to summarize, a sampling distribution is a distribution of one of our parameter estimates, such as y, s 2 or p, etc. 3) Another way of looking at it (similar to book): Let s do an experiment many times (for example, we take a sample many times). The we ask: Each time we calculate the sample mean. So we'd have y 1, y 2, y 3... y m. What is the distribution of these sample means? [Your book uses the term meta-experiment ]. But we also want to know: What is the population mean of all these sample means? What are the population variance and population standard deviation of these sample means? (And, for that matter, how do we estimate these using y and s 2?) As it turns out, we don t need to do m number of experiments for this to work. Instead we can use some mathematics and figure it out. The math is similar to that used for the mathematical definition of a population mean (in other words, we use expected values to calculate this). We won t go into the details here. 4) So we need to know the answers to the above questions. Without doing the math, here they are: The (theoretical or population) mean for y is given as follows: μ y = μ The (theoretical or population) standard deviation for y is given by: σ y = σ (n)

See also p. 159 [159] {151} in your text. Are you surprised by the first result? Hopefully not. It makes a certain intuitive sense that if: y estimates μ, then y of a bunch of y 's also estimates μ. The second result is more difficult to understand (it's not quite intuitive). But think about the following: If you use a bunch of y 's, you would expect them to be less spread out than the original data. In other words, you would expect the variance (or standard deviation) for y to be less than the variance (or standard deviation) for y. (The y 's are closer to each other than the y's). And remember, that we could do some math and show that this is actually the case. Here's a graphical representation of what's going on: Which also leads us directly into the next question.

So what about the actual distribution of Y? If Y is normal, Y is normal (again, hopefully no surprise). If Y is not normal, then if n is large enough, Y is still approximately normal! This answer is due to the Central Limit Theorem (CLT). This is one of the reasons that the normal distribution is so important to statistics. (There are some theoretical exceptions to this theorem, but in practice they're usually not that important). In other words: Take almost any distribution at all. Take the average, y. The average, y will be normally distributed if n is large enough. (What is a large enough n? We'll answer that question later when we start looking at the t-distribution.) So, to summarize, what does all this mean? That we can use our normal tables to calculate probabilities associated with Y. For example, we can answer questions like: Pr{ Y < 68 inches} or Pr{ Ȳ 1 < Ȳ 2 } where Ȳ 1 = average blood pressure on medication and Ȳ 2 = average blood pressure on placebo (In practice, it turns out that we'll usually wind up using t-tables (i.e., the t-distribution) since our n will often not be large enough, but more on that next time). Questions like these are much more interesting than the ones we've been asking until now. This will lead us directly into confidence intervals and then (finally!) statistical hypothesis tests. These concepts are really important, so if you didn't follow everything in class, you might want to reread the notes or go through the text: See section 5.1 [5.1] {5.1}. If you have the 2 nd or 3 rd edition, you should also read through 5.3 [5.3], if you have the 4 th edition, read through {5.2}.

5) Enough theory. Let's see how to use sampling distributions. Example 5.7, p. 160 [5.9, p. 159] {5.2.2, p. 152}. We're looking at princess peas. Somehow, we again know that μ = 500 mg, and σ = 120 mg. We also know that Y = seed weight is normally distributed. We take a sample of 4 seeds. What is the (population) mean of Y for our four seeds? What is the population standard deviation? μ y = μ = 500mg σ y = σ n = 120 4 = 60mg Now suppose we want to calculate Pr{ Y > 550 mg}. You proceed the same way as always, except you need to make sure you use the correct value for σ. Use σ for Y, NOT σ for Y. So let's continue with the example: z = y μ y σ y = 550 500 60 = 0.83 6) Standard Error (SE) so we have Pr {Ȳ >550}= Pr {Z >0.83} = 1 0.7967 = 0.2033 Again, note that we use 60, not 120 in the denominator above. If you have the 4 th edition, you might also want to check out example {4.2.4, p. 155}, which explains the effect of sample size. (We skipped over the rest of chapter 5 and are now in chapter 6!) Let's start to worry about the fact that we almost never know μ and σ. So what do we do instead? How can we answer probability questions if we don't know μ and σ. We won't learn the answer to that yet, but it should be obvious that somehow we need to use our estimates ( y and s 2 ) instead of μ and σ. We estimate μ with y, and σ with s. We ve discussed this before. The fact that we have to estimate σ with s will cause use few problems later, but let's not worry about it now.

So how does all this affect our sampling distributions if don t know μ or σ? In particular, what about the standard deviation for Y?? Instead of using σ we use s. In other words, we estimate: n with s n This quantity is also known as the Standard Error ( in this case for y ): SE y = s n So what does the Standard Error tell us? Basically how reliable our y is as an estimate of the mean. If the SE is small, our y is a pretty good estimate; if it s big, our y is not so good an estimate. More on this next time. Important: the Standard error is not always the same; it depends on what you're interested in: If you're interested in y, then it's defined as above. You can also have a standard error for proportions and other quantities. Let's finish this set of notes by looking at the effect of sample size on some of the quantities we've discussed. Example 6.4, p. 185 [6.4, p. 182] {6.3.2, p. 173-174}: 28 lambs were weighted at birth with the following results: y = 5.17 kg. s =.65 kg. SE =.12 kg. (SE =.65/ 28 = 0.12 ; you can figure out the SE's for the rest of the example yourself) Now what would happen if instead of 28 lambs, we had sampled higher numbers of lambs?

Remember: μ and σ are not random variables They don't change, and are obviously not affected by sample size! y and s 2 are random variables They can change depending on what's in your sample, but usually don't change drastically with sample size. SE is also random SE is affected by sample size (it get's smaller, which is easy to see from the equation above). You should make sure you thoroughly understand this example!