Practice Problems Section Problems

Similar documents
CH.9 Tests of Hypotheses for a Single Sample

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Math Review Sheet, Fall 2008

Modeling and Performance Analysis with Discrete-Event Simulation

Review. December 4 th, Review

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

TUTORIAL 8 SOLUTIONS #

Basic Concepts of Inference

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Sociology 6Z03 Review II

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

CONTINUOUS RANDOM VARIABLES

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

Primer on statistics:

Statistical Inference

Modern Methods of Data Analysis - WS 07/08

Module 6: Methods of Point Estimation Statistics (OA3102)

280 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Tests of Statistical Hypotheses

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Mathematical statistics

Math 152. Rumbos Fall Solutions to Assignment #12

Design of Engineering Experiments

Learning Objectives for Stat 225

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Maximum-Likelihood Estimation: Basic Ideas

Introduction to Maximum Likelihood Estimation

Statistics 3858 : Maximum Likelihood Estimators

5.1 Introduction. # of successes # of trials. 5.2 Part 1: Maximum Likelihood. MTH 452 Mathematical Statistics

Mathematical statistics

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Review of Statistics 101

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

Mathematical statistics

Multiple Regression Analysis

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

Exam C Solutions Spring 2005

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

MTH 452 Mathematical Statistics

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Northwestern University Department of Electrical Engineering and Computer Science

Bias Variance Trade-off

Math 494: Mathematical Statistics

HANDBOOK OF APPLICABLE MATHEMATICS

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

2008 Winton. Statistical Testing of RNGs

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Institute of Actuaries of India

This does not cover everything on the final. Look at the posted practice problems for other topics.

Institute of Actuaries of India

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

APPENDICES APPENDIX A. STATISTICAL TABLES AND CHARTS 651 APPENDIX B. BIBLIOGRAPHY 677 APPENDIX C. ANSWERS TO SELECTED EXERCISES 679

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

p(z)

Statistical Data Analysis Stat 3: p-values, parameter estimation

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Central Limit Theorem ( 5.3)

1; (f) H 0 : = 55 db, H 1 : < 55.

Statistical Inference with Regression Analysis

Probability and Estimation. Alan Moses

Chapter 7: Hypothesis Testing

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Random Number Generation. CS1538: Introduction to simulations

ISyE 6644 Fall 2014 Test 3 Solutions

An interval estimator of a parameter θ is of the form θl < θ < θu at a

Probability and Statistics Notes

Distribution Fitting (Censored Data)

Inferring from data. Theory of estimators

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

Non-parametric Inference and Resampling

SPRING 2007 EXAM C SOLUTIONS

STAT 512 sp 2018 Summary Sheet

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

Lectures 5 & 6: Hypothesis Testing

Math 152. Rumbos Fall Solutions to Exam #2

15 Discrete Distributions

1. Point Estimators, Review

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

STATISTICS SYLLABUS UNIT I

A Very Brief Summary of Statistical Inference, and Examples

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Using R in Undergraduate Probability and Mathematical Statistics Courses. Amy G. Froelich Department of Statistics Iowa State University

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Transcription:

Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85, 86, 88, 90 4-94, 96

4-1 Statistical Inference The field of statistical inference consists of those methods used to make decisions or draw conclusions about a population. These methods utilize the information contained in a sample from the population in drawing conclusions.

4-1 Statistical Inference 4- Point Estimation

4- Point Estimation 4- Point Estimation

General Concepts of Point Estimation Point Estimator A point estimator of a parameter is a single number that can be regarded as a sensible value for. A point estimator can be obtained by selecting a suitable statistic and computing its value from the given sample data.

Unbiased Estimator ˆ A point estimator is said to be an unbiased estimator of if E( ˆ ) for every possible value of. If ˆ biased, the difference E( ˆ ) is called the bias of ˆ. is The pdf s of a biased estimator ˆ 1 and an unbiased estimator ˆ for a parameter. pdf of ˆ pdf of ˆ 1 Bias of 1

The pdf s of a biased estimator ˆ 1 and an unbiased estimator ˆ for a parameter. pdf of ˆ Bias of 1 pdf of ˆ 1 Unbiased Estimator When X is a binomial rv with parameters n and p, the sample proportion pˆ X / n is an unbiased estimator of p.

Principle of Unbiased Estimation When choosing among several different estimators of, select one that is unbiased. Unbiased Estimator Unbiased Estimates: A point estimate ˆ of a parameter is unbiased if E( ˆ) bias E( ˆ )

Unbiased Estimator Unbiased estimate of a success probability of a binomial rv X, X pˆ. n Unbiased estimate of population mean, ˆ Unbiased estimate of population variance, ˆ S 1 n 1 n i1 ( X i X ) Note: Capital letters = theoretical meanings X Principle of Minimum Variance Unbiased Estimation Among all estimators of that are unbiased, choose the one that has the minimum variance. The resulting ˆ is called the minimum variance unbiased estimator (MVUE) of.

Graphs of the pdf s of two different unbiased estimators pdf of ˆ 1 pdf of ˆ MVUE for a Normal Distribution Let X 1, X,,X n be a random sample from a normal distribution with parameters and. Then the estimator is the MVUE for. ˆ X

A biased estimator that is preferable to the MVUE pdf of ˆ (biased) 1 pdf of ˆ (the MVUE) Mean Square Error Mean square error: The expectation of the squared deviation of the point estimate about the value of the parameter of interest. ˆ Var( ˆ) MSE( ˆ) E bias In general the variance of good point estimates decrease as the sample size increases.

Relative Efficiency We can compare estimators by looking at MSE, where the one with smaller MSE being a better estimator. Another way is to compute relative efficiency MSE( ˆ 1) relative efficiency MSE( ˆ ) If rel.eff. > 1, ˆ is better. Standard Error ˆ = V ( ˆ ) The standard error of an estimator its standard deviation ˆ. If the standard error itself involves unknown parameters whose values can be estimated, substitution into ˆ yields the estimated standard error of the estimator, denoted ˆ or s. ˆ ˆ is

Methods of Point Estimation Method of Moments Let X 1, X,,X n be a random sample from a pmf or pdf f (x). For k = 1,, The k th population moment, or k th k moment of the distribution f (x) is E( X ). The k th sample moment is 1 n k X. i1 i n

Moment Estimators Let X 1, X,,X n be a random sample from a distribution with pmf or pdf f( x;,..., ), where,..., 1 m 1 are parameters whose values are unknown. Then the moment estimators,..., are obtained by equating the first 1 m m sample moments to the corresponding first m population moments and solving for,...,. 1 m m Practice Exponential Normal Gamma Other distributions on your own Note:Try them all, and one of them may be on the final exam

Parameter Estimation Using The Method of Maximum Likelihood The idea: Consider a set of data values x, that are 1 x,..., x n independent observations from a population that has probability density function f ( x, ). For only one observation, say x 1, the density probability function of observing this observation can be written as f ( x 1, ) Now for a sample containing a set of data values, the density probability function of observing these observations is a joint density function f x, x,..., ) f ( x 1, ) f ( x, ) f ( x, ) ( 1 x n n n Parameter Estimation Using MLE Let us call the joint density function the likelihood function, and the estimate from this function, the maximum likelihood estimate (MLE). L ( 1 x1, x,..., xn, ) f ( x, ) f ( xn, ) f ( xn, ) The goal is to find the most likely value of the parameter that maximizes the probability of observing this set of data in our sample.

Parameter Estimation Using MLE This is done by taking a derivative of the likelihood function with respect to the parameter value. It is often more convenient that the natural log of the likelihood function is taken before differentiating the function. This can be done because the natural log function is monotonic, thus, maximizing the log-likelihood is equivalent to maximizing the likelihood. The steps are as follows: Write f ( x, ). Write L f ( x 1, ) f ( xn, ) f ( xn, ). Rearrange the term using sum and product. Take the natural-log log L. Take the 1 st derivative with respect to the parameter and set it to zero d logl 0 d (or the Gradient for more than one parameter). Take the nd derivative and check whether it is a maximum, i.e. it should always be negative d logl 0 (or 0) d (or the Hessian must be ND or NSD, for more than one parameter).

An Example Poisson Distribution The steps are as follows: Write Write Rearrange the term Take the natural-log! ) ; ( 1 1 1 x e x f x ) ; (... ) ; ( ) ; ( 1 n x f x f x f L!... 1!! 1 n x x x x e x e x e n n i i x n x e L n i i 1! 1 n i i n i x i x n L 1 1! log log log An Example Poisson Distribution (Cont.) Take the 1 st derivative with respect to the parameter and set it to zero Take the nd derivative and check whether it is a maximum, i.e. it should always be negative 0 log 1 n i x i n d L d x nx n 0 0 log x n d L d

The Invariance Principle ˆ,..., ˆ m,..., 1 m Let 1 parameters be the MLE s of the Then the MLE of any function h( 1,..., m ) of these parameters is the function of the MLE s or, ˆ ˆ (,..., ) h 1 m Desirable Property of the Maximum Likelihood Estimate Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter is approx. unbiased [ E( ˆ ) ] and has variance that is nearly as small as can be achieved by any estimator. mle ˆ MVUE of

Practice Geometric Exponential Weibull Other distributions on your own Note:Try them all, and one of them may be on the final exam 4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses We like to think of statistical hypothesis testing as the data analysis stage of a comparative experiment, in which the engineer is interested, for example, in comparing the mean of a population to a specified value (e.g. mean pull strength).

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses For example, suppose that we are interested in the burning rate of a solid propellant used to power aircrew escape systems. Now burning rate is a random variable that can be described by a probability distribution. Suppose that our interest focuses on the mean burning rate (a parameter of this distribution). Specifically, we are interested in deciding whether or not the mean burning rate is 50 centimeters per second. 4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses Two-sided Alternative Hypothesis One-sided Alternative Hypotheses

4-3 Hypothesis Testing 4-3.1 Statistical Hypotheses Test of a Hypothesis A procedure leading to a decision about a particular hypothesis Hypothesis-testing procedures rely on using the information in a random sample from the population of interest. If this information is consistent with the hypothesis, then we will conclude that the hypothesis is true; if this information is inconsistent with the hypothesis, we will conclude that the hypothesis is false. 4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses

4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses 4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses Sometimes the type I error probability is called the significance level, or the -error, or the size of the test.

4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses 4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses

4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses 4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses

4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses 4-3 Hypothesis Testing 4-3. Testing Statistical Hypotheses The power is computed as 1 -, and power can be interpreted as the probability of correctly rejecting a false null hypothesis. We often compare statistical tests by comparing their power properties. For example, consider the propellant burning rate problem when we are testing H 0 : = 50 centimeters per second against H 1 : not equal 50 centimeters per second. Suppose that the true value of the mean is = 5. When n = 10, we found that = 0.643, so the power of this test is 1 - = 1-0.643 = 0.7357 when = 5.

4-3 Hypothesis Testing 4-3.3 One-Sided and Two-Sided Hypotheses Two-Sided Test: One-Sided Tests: How to write hypothesis statement For -sided hypothesis: Plain and simple Write hypotheses in terms of population parameters, NOT statistics Write specific statement in the null hypothesis Null hypothesis and alternative hypothesis should cover the space of the parameter values

How to write hypothesis statement For 1-sided hypothesis: Be careful, it could be confusing sometimes Look for words like Words like more than, greater than, at least all means > ; (in this context, at least is not ) Similarly, less than, no more than, not exceeding all means < (not ) Translate these words correctly: slower, faster, better, at least as low as, etc. How to write hypothesis statement For 1-sided hypothesis (continued): Translate the question of interest in the problem statement into mathematical/logical sign, then place the statement in the alternative hypothesis Then, put an opposite statement with equal sign ( or ) in the null hypothesis Remember, the alternative hypothesis should always contain: Statement we want to have sufficient evidence to support Statement we want to conclude with confidence Then, turn them into proper sign when writing the hypotheses

4-3 Hypothesis Testing 4-3.3 P-Values in Hypothesis Testing 4-3 Hypothesis Testing 4-3.5 General Procedure for Hypothesis Testing

Preferred Procedure Steps 1-4, and 6 are the same Skip step 5 Step 7 is concluded based on the p-value General criteria (in this class) P-value is greater than 0.1 Do not reject P-value is less than 0.01 Reject P-value is between 0.01 and 0.1 Grey area, need more data Note: Less conservative people/application, 0.05 may be used instead of 0.01 (I don t recommend) More conservative people/application may use a criteria less than 0.01 depending on experience 4-4 Inference on the Mean of a Population, Variance Known Assumptions

4-4 Inference on the Mean of a Population, Variance Known 4-4.1 Hypothesis Testing on the Mean We wish to test: The test statistic is: 4-4 Inference on the Mean of a Population, Variance Known 4-4.1 Hypothesis Testing on the Mean Reject H 0 if the observed value of the test statistic z 0 is either: Fail to reject H 0 if or

4-4 Inference on the Mean of a Population, Variance Known 4-4.1 Hypothesis Testing on the Mean 4-4 Inference on the Mean of a Population, Variance Known 4-4.1 Hypothesis Testing on the Mean

4-4 Inference on the Mean of a Population, Variance Known 4-4. Type II Error and Choice of Sample Size Finding The Probability of Type II Error 4-4 Inference on the Mean of a Population, Variance Known 4-4. Type II Error and Choice of Sample Size Sample Size Formulas

4-4 Inference on the Mean of a Population, Variance Known 4-4. Type II Error and Choice of Sample Size Sample Size Formulas 4-4 Inference on the Mean of a Population, Variance Known 4-4. Type II Error and Choice of Sample Size

4-4 Inference on the Mean of a Population, Variance Known 4-4. Type II Error and Choice of Sample Size 4-4 Inference on the Mean of a Population, Variance Known 4-4.3 Large Sample Test In general, if n 30, the sample variance s will be close to for most samples, and so s can be substituted for in the test procedures with little harmful effect.

4-4 Inference on the Mean of a Population, Variance Known 4-4.4 Some Practical Comments on Hypothesis Testing The Seven-Step Procedure Only three steps are really required: 4-4 Inference on the Mean of a Population, Variance Known 4-4.4 Some Practical Comments on Hypothesis Testing Statistical versus Practical Significance

4-4 Inference on the Mean of a Population, Variance Known 4-4.4 Some Practical Comments on Hypothesis Testing Statistical versus Practical Significance 4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Two-sided confidence interval: One-sided confidence intervals: Confidence coefficient:

4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean 4-4 Inference on the Mean of a Population, Variance Known 4-4.6 Confidence Interval on the Mean

4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean 4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Relationship between Tests of Hypotheses and Confidence Intervals If [l,u] is a 100(1 - ) percent confidence interval for the parameter, then the test of significance level of the hypothesis will lead to rejection of H 0 if and only if the hypothesized value is not in the 100(1 - ) percent confidence interval [l, u].

4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Confidence Level and Precision of Estimation The length of the two-sided 95% confidence interval is whereas the length of the two-sided 99% confidence interval is 4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Choice of Sample Size

4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Choice of Sample Size 4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean Choice of Sample Size

4-4 Inference on the Mean of a Population, Variance Known 4-4.5 Confidence Interval on the Mean One-Sided Confidence Bounds 4-4 Inference on the Mean of a Population, Variance Known 4-4.6 General Method for Deriving a Confidence Interval

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean 4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean 4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean Calculating the P-value

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean 4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean 4-5 Inference on the Mean of a Population, Variance Unknown 4-5.1 Hypothesis Testing on the Mean

4-5 Inference on the Mean of a Population, Variance Unknown 4-5. Type II Error and Choice of Sample Size Fortunately, this unpleasant task has already been done, and the results are summarized in a series of graphs in Appendix A Charts Va, Vb, Vc, and Vd that plot for the t-test against a parameter for various sample sizes n. 4-5 Inference on the Mean of a Population, Variance Unknown 4-5. Type II Error and Choice of Sample Size These graphics are called operating characteristic (or OC) curves. Curves are provided for two-sided alternatives on Charts Va and Vb. The abscissa scale factor d on these charts is defined as

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.3 Confidence Interval on the Mean 4-5 Inference on the Mean of a Population, Variance Unknown 4-5.3 Confidence Interval on the Mean

4-5 Inference on the Mean of a Population, Variance Unknown 4-5.4 Confidence Interval on the Mean 4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population

4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population 4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population

4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population 4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population

4-6 Inference on the Variance of a Normal Population 4-6.1 Hypothesis Testing on the Variance of a Normal Population 4-6 Inference on the Variance of a Normal Population 4-6. Confidence Interval on the Variance of a Normal Population

4-7 Inference on Population Proportion 4-7.1 Hypothesis Testing on a Binomial Proportion We will consider testing: 4-7 Inference on Population Proportion 4-7.1 Hypothesis Testing on a Binomial Proportion

4-7 Inference on Population Proportion 4-7.1 Hypothesis Testing on a Binomial Proportion 4-7 Inference on Population Proportion 4-7.1 Hypothesis Testing on a Binomial Proportion

4-7 Inference on Population Proportion 4-7. Type II Error and Choice of Sample Size 4-7 Inference on Population Proportion 4-7. Type II Error and Choice of Sample Size

4-7 Inference on Population Proportion 4-7.3 Confidence Interval on a Binomial Proportion 4-7 Inference on Population Proportion 4-7.3 Confidence Interval on a Binomial Proportion

4-7 Inference on Population Proportion 4-7.3 Confidence Interval on a Binomial Proportion Choice of Sample Size 4-8 Other Interval Estimates for a Single Sample 4-8.1 Prediction Interval

4-8 Other Interval Estimates for a Single Sample 4-8. Tolerance Intervals for a Normal Distribution 4-10 Testing for Goodness of Fit So far, we have assumed the population or probability distribution for a particular problem is known. There are many instances where the underlying distribution is not known, and we wish to test a particular distribution. Use a goodness-of-fit test procedure based on the chisquare distribution.

Goodness-of-Fit Tests Conduct hypothesis testing on data distribution: Chi-square test Likelihood ratio test No single correct distribution in a real application exists. If very little data are available, it is unlikely to reject any candidate distributions If a lot of data are available, it is likely to reject all candidate distributions Chi-Square test Intuition: comparing the histogram of the data to the shape of the candidate density or mass function Valid for large sample sizes when parameters are estimated by maximum likelihood By arranging the n observations into a set of k class intervals or cells, the test statistics is: Observed Frequency 0 k i1 ( O E ) i E i i Expected Frequency E i = n*p i where p i is the theoretical prob. of the ith interval. Suggested Minimum = 5 which approximately follows the chi-square distribution with k-s-1 degrees of freedom, where s = # of parameters of the hypothesized distribution estimated by the sample statistics.

Chi-Square test The hypothesis of a chi-square test is: H 0 : The random variable, X, conforms to the distributional assumption with the given parameters (or by the estimates). H 1 : The random variable X does not conform. Example H 0 : Student weights (X) are normally distributed (distributional assumption) with the mean weight of 55 kg (given parameter) or, H 0 : Student weights (X) are normally distributed (distributional assumption) in this case the value of the parameter is estimated from the sample. H 1 : Not the case Chi-Square test If the distribution tested is discrete and combining adjacent cell is not required (so that E i > minimum requirement): UsuallyE i is required to be at least 5 Each value of the random variable should be a class interval, unless combining is necessary, and p i p(x i ) P(X x i )

Chi-Square test If the distribution tested is continuous: p i a i a i 1 f ( x) dx F( ai ) F( ai 1) where a i -1 and a i are the endpoints of the i th class interval and f(x) is the assumed pdf, F(x) is the assumed cdf. Recommended number of class intervals (k): Sample Size, n Number of Class Intervals, k 0 Do not use the chi-square test 50 5 to 10 100 10 to 0 > 100 n 1/ to n/5 Caution: Different grouping of data (i.e., k) can affect the hypothesis testing result. Vehicle Arrival Example: H 0 : the random variable is Poisson distributed. H 1 : the random variable is not Poisson distributed. x i Observed Frequency, O i Expected Frequency, E i (O i - E i ) /E i 0 1.6 1 10 Ei np( x) 9.6 7.87 x 0.15 3 17 x! 1.1 0.8 4 19 19. 4.41 19 e n 17.4 5 6 14.0.57 6 7 8.5 0.6 7 5 4.4 8 5.0 9 3 0.8 10 3 0.3 > 11 1 0.1 11.6 Combined because of min E i 100 100.0 7.68 Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is rejected at the 0.05 level of significance. 0 7.68 0.05,5 11.1

Steps in Chi-Square Test H 0 : The X 1, X,..., X n are IID random variables with distribution function Fˆ Note that accepting H 0 distributed as Fˆ!! does not mean that the data are truly Step 1: Divide the range of the fitted distribution into k adjacent intervals, a, ), a, ),, a, a ); a or a or both. [ a 0 1 [ 1 a [ k 1 k Step : Count the observed frequency 0 O j in each interval. k Steps in Chi-Square Test (Cont.) Step 3: Compute the expected proportion For continuous, p For discrete, p j a j a j 1 j x j a x a j 1 Then, compute the expected frequency Step 4: Compute the chi-square test statistic p j in each interval. fˆ ( x) dx, where f ˆ( x) is the pdf. j pˆ ( ) where p ˆ( x) is the pmf. j E np in each interval. j j k reject H 0 with confidence (1-) 100% if > k m 1, where m = # of estimated parameter(s). j 1 O j E E j j 1, v. We with

Setting the Equi-probable Intervals The equiprobable intervals use p 1 p... pk. Set each interval to have np 5, thus, we let Set expected probability n E j. k Let end points be solving j n k. 5 1 p j, thus, the expected frequency k a j, j 1,,..., k, then we solve for each a j by j Fˆ. k a j 1 Exercise: Chi-Square Test In-Class Exercise: Suppose that 50 interarrival times (in minutes) are collected over the following 100-minute interval. Test the null hypothesis that the interarrival times are exponentially distributed. The data are sorted as follows. 0.04 0.07 0.1 0.1 0.1 0.6 0.3 0.44 0.46 0.5 0.53 0.64 0.67 0.76 0.83 0.85 1.0 1.07 1.08 1.09 1.1 1.35 1.4 1.46 1.53 1.63 1.89 1.95.0.04.06.11.4.6.34.44.54.74.8.88.9 3.15 3.19 3.93 4.57 5.37 5.55 6.58 8.3

In-Class Exercise (Cont.) j 0 1 3 4 5 6 7 8 a 0 0.6 0.56 0.9 1.35 1.91.70 4.05 j Interval [0, 0.6) [0.6, 0.56) [0.56, 0.9) [0.9, 1.35) [1.35, 1.91) [1.91,.70) [.70, 4.05) [4.05, ) Observed Frequency O j Expected Frequency E j N j np np j j Interval Observed Frequency O j Expected Frequency E j The degree of freedom is k m 1 8 11 6. The critical value from the chi-square table with 6,0.05 1. 59. Therefore, we cannot reject the hypothesis that the data are exponentially distributed ( = 3.36 < 1. 59). 6,0.05 j np [0, 0.6) 5 6.5 0.5 [0.6, 0.56) 6 6.5 0.01 [0.56, 0.9) 5 6.5 0.5 [0.9, 1.35) 5 6.5 0.5 [1.35, 1.91) 6 6.5 0.01 [1.91,.70) 10 6.5.5 [.70, 4.05) 7 6.5 0.09 [4.05, ) 5 6.5 0.5 N np 3.36 j j

Likelihood Ratio Test A better test than Chi-Square test Instead of X statistic, compute the likelihood ratio G statistic k O i G O i ln i1 Ei In the previous example, this G statistic would be G 5ln 5 6.5 6ln 6 6.5... 5ln 5 6.5 1.081 which has a higher p-value (easier NOT to reject)