MVE055/MSG Lecture 8

Similar documents
Petter Mostad Mathematical Statistics Chalmers and GU

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Lecture 19. Condence Interval

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Confidence Intervals for Normal Data Spring 2014

Severity Models - Special Families of Distributions

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

Review for the previous lecture

Content by Week Week of October 14 27

Hypothesis Testing Problem. TMS-062: Lecture 5 Hypotheses Testing. Alternative Hypotheses. Test Statistic

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

Hypothesis Testing One Sample Tests

Statistical Inference

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Lecture 28: Asymptotic confidence sets

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Multivariate Statistical Analysis

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Confidence Intervals for Normal Data Spring 2018

STA 2201/442 Assignment 2

A Few Special Distributions and Their Properties

Comparing two independent samples

Pivotal Quantities. Mathematics 47: Lecture 16. Dan Sloughter. Furman University. March 30, 2006

Chapter 11 Sampling Distribution. Stat 115

7.1 Basic Properties of Confidence Intervals

1 Statistical inference for a population mean

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Exam C Solutions Spring 2005

Stat 5101 Lecture Notes

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

STAT 4385 Topic 01: Introduction & Review

Efficient and Robust Scale Estimation

Statistics. Statistics


Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

INTERVAL ESTIMATION AND HYPOTHESES TESTING

Statistical Intervals (One sample) (Chs )

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Statistics and data analyses

Topic 16 Interval Estimation

Visual interpretation with normal approximation

STAT 512 sp 2018 Summary Sheet

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Statistics II Lesson 1. Inference on one population. Year 2009/10

SOLUTION FOR HOMEWORK 4, STAT 4352

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Will Murray s Probability, XXXII. Moment-Generating Functions 1. We want to study functions of them:

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Probability Density Functions

Lecture The Sample Mean and the Sample Variance Under Assumption of Normality

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

II. The Normal Distribution

Chapter 3 Common Families of Distributions

Lecture 1: August 28

Section 9 2B:!! Using Confidence Intervals to Estimate the Difference ( µ 1 µ 2 ) in Two Population Means using Two Independent Samples.

BTRY 4090: Spring 2009 Theory of Statistics

Business Statistics. Lecture 5: Confidence Intervals

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Stat 5102 Final Exam May 14, 2015

Linear models and their mathematical foundations: Simple linear regression

Economics 326 Methods of Empirical Research in Economics. Lecture 7: Con dence intervals

SPRING 2007 EXAM C SOLUTIONS

Normal (Gaussian) distribution The normal distribution is often relevant because of the Central Limit Theorem (CLT):

Point Estimation and Confidence Interval

Comparing Systems Using Sample Data

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Asymptotic distribution of the sample average value-at-risk

Section 4.6 Simple Linear Regression

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

Stat 5101 Notes: Brand Name Distributions

Chapter 8 - Statistical intervals for a single sample

COMPARING GROUPS PART 1CONTINUOUS DATA

Review of probability and statistics 1 / 31

Advanced Herd Management Probabilities and distributions

MS&E 226: Small Data

Confidence intervals CE 311S

Lecture 6. Prior distributions

STA 2101/442 Assignment 3 1

SOME SPECIFIC PROBABILITY DISTRIBUTIONS. 1 2πσ. 2 e 1 2 ( x µ

PHP2510: Principles of Biostatistics & Data Analysis. Lecture X: Hypothesis testing. PHP 2510 Lec 10: Hypothesis testing 1

Appendix A Numerical Tables

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

Financial Econometrics and Quantitative Risk Managenent Return Properties

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Multivariate Statistical Analysis

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

Introduction to Time Series Analysis. Lecture 11.

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Using Tables and Graphing Calculators in Math 11

Lecture 2. Estimating Single Population Parameters 8-1

Lecture 15. Hypothesis testing in the linear model

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Sampling Distributions

The regression model with one fixed regressor cont d

Page Max. Possible Points Total 100

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Transcription:

MVE055/MSG810 2017 Lecture 8 Petter Mostad Chalmers September 23, 2017

The Central Limit Theorem (CLT) Assume X 1,..., X n is a random sample from a distribution with expectation µ and variance σ 2. Then, when n, we get that X Normal(µ, σ/ n) For finite n the normal distribution can be used as an approximation. How large n needs to be for the approximation to be OK depends on what accuracy is needed, and on the properties of the distribution the X i come from. Some distributions have no variance, then the CLT does not apply!

The normal distribution as an approximation It is the mean value X that becomes approximately normally distributed when the number of observations n increases. The sample itself does not become normally distributed just because n increases! However some random variables can be interpreted as a sum (or mean) of many (independent) random variables. Then, in some cases, they are well approximated by a normally distributed variable. Examples are: The Binomial distribution with n large and p not too close to 0 or 1. The Poisson distribution with a large intensity λ. The Gamma distribution with a large α parameters. The χ 2 distribution with many degrees of freedom. In such cases, the table for the Normal distribution can be used to compute approximate quantiles.

Confidence intervals A 100(1 α)% confidence interval for a parameter θ is a formula which from a random sample X 1,..., X n computes random variables L 1 and L 2 so that, for all θ, Pr [θ [L 1, L 2 ]] = 1 α Correct interpretation of a confidence interval: If you generate N new random samples from the same distribution and from these compute N new confidence intervals according to the formula, then 100(1 α)% of these will contain θ as N. WRONG interpreation of a confidence interval: The interval computed from a sample contains θ with 100(1 α)% probability. However, if one interprets θ as a random variable and make certain assumptions, the second interpretation can become correct. These assumptions are often reasonable. This underlies the popularity of the confidence interval in applications.

Example 1: Confidence interval for µ in a Normal(µ, σ) distribution If X 1,..., X n Normal(µ, σ) is a random sample, if z α/2 is so that [ z α/2, z α/2 ] contains 100(1 α)% of the probability in a standard normal distribution, and if we define L 1 = X z α/2 σ/ n L 2 = X + z α/2 σ/ n then [L 1, L 2 ] is a 100(1 α)% confidence interval for µ. The proof is based on using that X Normal(µ, σ/ n). Note how we find z α/2 in the table for the standard normal distribution. Traditionally we use α = 0.05, giving z 0.05/2 = 1.96. Note: The formulas for L 1 and L 2 contain σ, so this interval can only be used if σ 2, the variance of the distribution, is known. (It is not enough to compute the sample variance of the data).

The distribution for the variance estimator S 2 = ˆσ 2 The confidence interval for µ was constructed based on knowing the distribution for the estimator X for µ. In the same way we base a confidence interval for σ 2 on the estimator ˆσ 2. X 1,..., X n Normal(µ, σ) and we define the estimator S 2 = ˆσ 2 = 1 n 1 n (X i X ) 2 i=1 the the distribution of this estimator satisfies (n 1)S 2 /σ 2 χ 2 (n 1), i.e., (n 1)S 2 /σ 2 has a χ 2 distribution with n 1 degrees of freedom. A proof can be constructed by (see Milton appendix C) first showing that S 2 och X är independent random variables (e.g., use moment generating functions). then using this to compute the moment-generating function for (n 1)S 2 /σ 2 and showing that it corresponds to that of the χ 2 (n 1) distribution.

Example 2: Confidence interval for σ 2 If X 1,..., X n Normal(µ, σ) is a random sample, if χ 2 n 1,α/2 and χ 2 n 1,1 α/2 are so that [χ2 n 1,α/2, χ2 n 1,1 α/2 ] contains 100(1 α)% of the probability in a χ 2 (n 1) distribution, and if we define L 1 = (n 1)S 2 /χ 2 n 1,1 α/2 L 2 = (n 1)S 2 /χ 2 n 1,α/2 then [L 1, L 2 ] is a 100(1 α)% confidence interval for σ 2. The proof is based on using that (n 1)S 2 /σ 2 χ 2 (n 1). Note how we find χ 2 n 1,1 α/2 and χ2 n 1,α/2 in the table for the χ 2 (k) distribution. Note: The formulas for L 1 och L 2 do not contain µ so this interval can be used even when µ is unknown.

The (Student) t distribution The random variable X has a (Student) t distribution with γ degrees of freedom, we write X T (γ), if the density is f (x) = Γ(γ + 1)/2 Γ(γ/2) πγ (1 + x 2 γ ) (γ+1)/2 When γ the t distribution will approach a standard normal distribution. When γ is smaller, the density is more pointy at the center, and has heavier tails, than the standard normal. We have E [X ] = 0 (if γ 1 the expectation does not exist) and Var [X ] = γ/(γ 2) (if γ 2 the variance does not exist). An important property: If Z Normal(0, 1) and X χ 2 (γ) are independent, then Z X /γ T (γ). Tables for the Student t distribution for various γ values are available in Milton.

The distribution for (X µ)/(s/ n) Our earlier confidence interval for µ depended on σ. We now construct a confidence interval for µ that depends on S 2 instead. We do this by studying a function of X och S 2. If X 1,..., X n Normal(µ, σ) is a random sample then (X µ)/(s/ n) T (n 1) In other words, the statistic has a t distribution with n 1 degrees of freedom. A proof can be based on X Normal(µ, σ/ n). (n 1)S 2 /σ 2 χ 2 (n 1) X and S 2 are independent. The property of the t distribution shown on the previous overhead.

Example 3: A confidence interval for µ not depending on σ If X 1,..., X n Normal(µ, σ) is a random sample, if t α/2 is so that [ t α/2, t α/2 ] contains 100(1 α)% of the probability in a t distribution with n 1 degrees of freedom, and if we define L 1 = X t α/2 S/ n L 2 = X + t α/2 S/ n then [L 1, L 2 ] is a 100(1 α)% confidence interval for µ. The proof is based on using that (X µ)/(s/ n) T(n 1). Note how we find t α/2 in the table for the t distribution. The formulas for L 1 and L 2 do not contain σ, so this interval can be used if the distribution varians σ 2 is unknown.

Example 4: Approximate confidence interval for µ based on CLT If X 1,..., X n is a random sample from a distribution with expectation µ and variance σ 2, if z α/2 is so that [ z α/2, z α/2 ] contains 100(1 α)% of the probability in a standard normal distribution, and if we define L 1 = X z α/2 S/ n L 2 = X + z α/2 S/ n where S is the sample standard deviation, then [L 1, L 2 ] is an approximate 100(1 α)% confidence interval for µ if n is large. The proof uses that, for large n we have, approximately, X Normal(µ, σ/ n) and for large n we have the approximation S σ. Note: For this to hold, the variance σ 2 must exist. One can always compute S from a sample, that S exists does not imply that σ exists!