Example. Populations and samples

Similar documents
The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

This does not cover everything on the final. Look at the posted practice problems for other topics.

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Estimation and Confidence Intervals

7 Estimation. 7.1 Population and Sample (P.91-92)

Probability and Distributions

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Statistics Ph.D. Qualifying Exam: Part II November 3, 2001

Introduction to Survey Analysis!

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Topic 3: Sampling Distributions, Confidence Intervals & Hypothesis Testing. Road Map Sampling Distributions, Confidence Intervals & Hypothesis Testing

Institute of Actuaries of India

Theory of Statistics.

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

Statistics 135 Fall 2007 Midterm Exam

Probability Theory and Statistics. Peter Jochumzen

Example. Test for a proportion

What is a parameter? What is a statistic? How is one related to the other?

Contents 1. Contents

Statistics II Lesson 1. Inference on one population. Year 2009/10

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Statistics Ph.D. Qualifying Exam: Part II November 9, 2002

using the beginning of all regression models

What Is a Sampling Distribution? DISTINGUISH between a parameter and a statistic

Name: Matriculation Number: Tutorial Group: A B C D E

Bootstrap, Jackknife and other resampling methods

Chapter 2. Discrete Distributions

Stat 5102 Final Exam May 14, 2015

Continuous Probability Distributions

Central Limit Theorem ( 5.3)

Chapter 6: SAMPLING DISTRIBUTIONS

7.1 Basic Properties of Confidence Intervals

STAT440/840: Statistical Computing

5.6 The Normal Distributions

ISyE 6644 Fall 2014 Test 3 Solutions

MATHEMATICAL METHODS

6 The normal distribution, the central limit theorem and random samples

Statistics: Learning models from data

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Introduction to Statistical Inference

Statistics 135 Fall 2008 Final Exam

Sampling Distribution: Week 6

Mathematical statistics

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

Ch7 Sampling Distributions

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Continuous Distributions

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Maximum Likelihood Large Sample Theory

Example. Test for a proportion

1 Introduction to Estimation

Descriptive statistics

ESTIMATION AND OUTPUT ANALYSIS (L&K Chapters 9, 10) Review performance measures (means, probabilities and quantiles).

STAT 830 Non-parametric Inference Basics

Math 141. Lecture 16: More than one group. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

STAT 512 sp 2018 Summary Sheet

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Section 4.6 Simple Linear Regression

Post-exam 2 practice questions 18.05, Spring 2014

Psych Jan. 5, 2005

Final Exam Review (Math 1342)

Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Week 3: Learning from Random Samples

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Statistics and Econometrics I

Math 494: Mathematical Statistics

Week 2: Review of probability and statistics

STAT 4385 Topic 01: Introduction & Review

Goodness of Fit Goodness of fit - 2 classes

Regression #3: Properties of OLS Estimator

Week 9 The Central Limit Theorem and Estimation Concepts

Statistics 100 Exam 2 March 8, 2017

BE540W Estimation Page 1 of 72. Topic 6 Estimation

Solutions to Practice Test 2 Math 4753 Summer 2005

Units. Exploratory Data Analysis. Variables. Student Data

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Conditional Maximum Likelihood Estimation of the First-Order Spatial Non-Negative Integer-Valued Autoregressive (SINAR(1,1)) Model

COMP2610/COMP Information Theory

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

PubHlth 540 Estimation Page 1 of 69. Unit 6 Estimation

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

STAT 461/561- Assignments, Year 2015

Sampling Distributions

= X+(z.95)SE(X) X-(z.95)SE(X)

1 Review of The Learning Setting

BIOS 312: Precision of Statistical Inference

STAT 285: Fall Semester Final Examination Solutions

Business Statistics. Lecture 10: Course Review

Monte Carlo Simulations and PcNaive

Chapter 11 Sampling Distribution. Stat 115

Random Variable. Pr(X = a) = Pr(s)

Test Problems for Probability Theory ,

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

1* (10 pts) Let X be a random variable with P (X = 1) = P (X = 1) = 1 2

Transcription:

Example Two strains of mice: A and B. Measure cytokine IL10 (in males all the same age) after treatment. A B 500 1000 1500 2000 2500 IL10 A B 500 1000 1500 2000 2500 IL10 (on log scale) Point: We re not interested in these particular mice, but in aspects of the distributions of IL10 values in the two strains. 1 Populations and samples We are interested in the distribution of measurements in the underlying (possibly hypothetical) population. Examples: Infinite number of mice from strain A; cytokine response to treatment. All T cells in a person; respond or not to an antigen. All possible samples from the Baltimore water supply; concentration of cryptospiridium. All possible samples of a particular type of cancer tissue; expression of a certain gene. We can t see the entire population (whether it is real or hypothetical), but we can see a random sample of the population (perhaps a set of independent, replicated measurements). 2

Parameters The object of our interest is the population distribution or, in particular, certain numerical attributes of the population distribution (called parameters). Examples: mean median SD proportion = 1 proportion > 40 geometric mean 95th percentile Parameters are usually assigned greek letters (like θ,, and ). 3 Sample data We make n independent measurements (or draw a random sample of size n). This gives X 1, X 2,..., X n independent and identically distributed (iid), following the population distribution. Statistic: Estimator: (estimate) A numerical summary (function) of the X s (that is, of the data). For example, the sample mean, sample SD, etc. A statistic, viewed as estimating some population parameter. We write: ˆθ an estimator of θ ˆp an estimator of p X = ˆ an estimator of S = ˆ an estimator of 4

Parameters, estimators, estimates The population mean A parameter A fixed quantity Unknown, but what we want to know X x The sample mean An estimator of A function of the data (the X s) A random quantity The observed sample mean An estimate of A particular realization of the estimator, X A fixed quantity, but the result of a random process. 5 Estimators are random variables Estimators have distributions, means, SDs, etc. X 1, X 2,..., X 10 X 3.8 8.0 9.9 13.1 15.5 16.6 22.3 25.4 31.0 40.0 18.6 6.0 10.6 13.8 17.1 20.2 22.5 22.9 28.6 33.1 36.7 21.2 8.1 9.0 9.5 12.2 13.3 20.5 20.8 30.3 31.6 34.6 19.0 4.2 10.3 11.0 13.9 16.5 18.2 18.9 20.4 28.4 34.4 17.6 8.4 15.2 17.1 17.2 21.2 23.0 26.7 28.2 32.8 38.0 22.8 6

Sampling distribution Distribution of X n = 5 Sampling distribution depends on: The type of statistic The population distribution The sample size 0 7 Bias, SE, RMSE Dist'n of sample SD (n=10) 5 10 15 Consider ˆθ, an estimator of the parameter θ. Bias: Standard error (SE): RMS error (RMSE): E(ˆθ θ) = E(ˆθ) θ SE(ˆθ) = SD(ˆθ). E{(ˆθ θ) 2 } = (bias) 2 + (SE) 2. 8

Example: heights of students 14 Frequency 12 10 8 6 4 mean = 67.1 SD = 3.9 2 0 60 65 70 75 Height (inches) 9 Example: heights of students mean = 67.1 SD = 3.9 Dist'n of Sample Ave (n=10) mean = 67.1 SD = 1.2 60 65 70 75 Height 60 65 70 75 Height (inches) Dist'n of Sample Ave (n=5) mean = 67.1 SD = 1.7 Dist'n of Sample Ave (n=25) mean = 67.1 SD = 0.68 60 65 70 75 Height (inches) 60 65 70 75 Height (inches) 10

The sample mean Assume X 1, X 2,..., X n are iid with mean and SD. Mean of X = E( X) =. Bias = E( X) = 0. SE of X = SD( X) = / n. RMS error of X = (bias)2 + (SE) 2 = / n. 11 If the population is normally distributed If X 1, X 2,..., X n are iid normal(,), then X normal(, / n). Distribution of X n 12

Example Suppose X 1, X 2,..., X 10 are iid normal(mean=10,sd=4) Then X normal(mean=10, SD 1.26); let Z = ( X 10)/1.26. Pr( X > 12)? 1.26 1 5.7% 10 12 1.58 0 Pr(9.5 < X < 10.5)? 31% 10 9.5 10.5 0 0.40 0.40 Pr( X 10 > 1)? 43% 9 10 11 0.80 0 0.80 13 Central limit theorm If X 1, X 2,..., X n are iid with mean and SD. and the sample size (n) is large, then X is approximately normal(, / n). How large is large? It depends on the population distribution. (But, generally, not too large.) 14

Example 1 Distribution of X n = 5 0 15 Example 2 Distribution of X 50 100 150 0 50 100 150 200 50 100 150 0 50 100 150 n = 500 50 100 150 16

Example 2 (rescaled) Distribution of X 50 100 150 0 50 100 150 200 20 40 60 80 100 0 20 25 30 35 40 45 50 55 n = 500 30 35 40 17 Example 3 Distribution of X 0 0.1 0.2 0.3 0.4 0.5 0 1 {X i } iid Pr(X i = 0) = 90% Pr(X i = 1) = 10% E(X i ) = 0.1; SD(X i ) = 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0 0.05 0.1 0.15 0.2 n = 500 X i binomial(n, p) X = proportion of 1 s 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 18

The sample SD Why use (n 1) in the sample SD? (xi x) 2 s = n 1 If {X i } are iid with mean and SD, then E(s 2 ) = 2 but E{ n 1 n In other words: Bias(s 2 ) = 0 s2 } = n 1 n 2 < 2 but Bias( n 1 n s2 ) = n 1 n 2 2 = 1 n 2 19 The distribution of the sample SD If X 1, X 2,..., X n are iid normal(, ) then the sample SD, s, satisfies (n 1) s 2 / 2 χ 2 n 1 When the X i are not normally distributed, this is not true. χ 2 distributions df=9 df=19 df=29 0 10 20 30 40 50 20

Distribution of sample SD (based on normal data) n=25 n=10 n=5 n=3 0 5 10 15 20 25 30 21 A non-normal example Distribution of sample SD n = 3 0 5 10 15 20 25 30 n = 5 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 22