Math 494: Mathematical Statistics

Similar documents
Math 494: Mathematical Statistics

Review. December 4 th, Review

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistical Inference

Smoking Habits. Moderate Smokers Heavy Smokers Total. Hypertension No Hypertension Total

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Probability and Statistics Notes

Spring 2012 Math 541B Exam 1

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

simple if it completely specifies the density of x

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Probability Theory and Statistics. Peter Jochumzen

BTRY 4090: Spring 2009 Theory of Statistics

Introductory Econometrics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Statistics: Learning models from data

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Definition 3.1 A statistical hypothesis is a statement about the unknown values of the parameters of the population distribution.

Basic Concepts of Inference

Practice Problems Section Problems

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Stat 5101 Lecture Notes

One-Sample Numerical Data

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 8 Inequality Testing and Moment Inequality Models

Math Review Sheet, Fall 2008

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Statistics. Statistics

Statistical Inference

Lecture 10: Generalized likelihood ratio test

Tests about a population mean

STAT 512 sp 2018 Summary Sheet

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Statistics 3858 : Maximum Likelihood Estimators

HT Introduction. P(X i = x i ) = e λ λ x i

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Lecture 4: Parameter Es/ma/on and Confidence Intervals. GENOME 560, Spring 2015 Doug Fowler, GS

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis jmding/math475/index.

STAT 461/561- Assignments, Year 2015

Summary of Chapters 7-9

Masters Comprehensive Examination Department of Statistics, University of Florida

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Hypothesis testing: theory and methods

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Parametric Techniques Lecture 3

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Chapter 7. Hypothesis Testing

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL

Parametric Techniques

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Post-exam 2 practice questions 18.05, Spring 2014

STAT 4385 Topic 01: Introduction & Review

UNIVERSITY OF TORONTO Faculty of Arts and Science

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

This does not cover everything on the final. Look at the posted practice problems for other topics.

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Glossary for the Triola Statistics Series

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

A Very Brief Summary of Statistical Inference, and Examples

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Classical and Bayesian inference

2014/2015 Smester II ST5224 Final Exam Solution

Performance Evaluation and Comparison

Statistical Inference. Hypothesis Testing

Review and continuation from last week Properties of MLEs

Better Bootstrap Confidence Intervals

Lecture 7 Introduction to Statistical Decision Theory

Chapter 8 of Devore , H 1 :

ST 371 (IX): Theories of Sampling Distributions

Introduction to Statistical Inference

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Primer on statistics:

Parametric Models: from data to models

Statistics 135 Fall 2008 Final Exam

Robustness and Distribution Assumptions

Statistical inference

Psychology 282 Lecture #4 Outline Inferences in SLR

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

Probability and Statistics

An interval estimator of a parameter θ is of the form θl < θ < θu at a

The Components of a Statistical Hypothesis Testing Problem

Non-parametric Inference and Resampling

TUTORIAL 8 SOLUTIONS #

Ch. 1: Data and Distributions

Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants

New Bayesian methods for model comparison

Business Statistics. Lecture 10: Course Review

Estimation of Quantiles

F79SM STATISTICAL METHODS

Transcription:

Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/ jmding/math494/ ) Spring 2018 Jimin Ding, Math WUSTL Math 494 Spring 2018 1 / 44

Introduction to Statistical Inferences Jimin Ding, Math WUSTL Math 494 Spring 2018 2 / 44

Statistical Problem A typical statistical problem can be summarized in the following flow: random experiments data analysis inferences The random experiments may come from natural sciences, social sciences, or engineered system. Sometimes they are well designed to control the factors of interest (experimental design), and they are from real-world processes (observational studies). Jimin Ding, Math WUSTL Math 494 Spring 2018 3 / 44

Statistical Problem A typical statistical problem can be summarized in the following flow: random experiments data analysis inferences The random experiments may come from natural sciences, social sciences, or engineered system. Sometimes they are well designed to control the factors of interest (experimental design), and they are from real-world processes (observational studies). The collected data could be scalars (exam responses), vectors (stock price), matrices (digital image), arrays (contingency table), characters (text mining), or functions (time series, fmri). Jimin Ding, Math WUSTL Math 494 Spring 2018 3 / 44

Statistical Analysis There are two ways of analyzing data: Descriptive data analysis (explanatory data analysis) summarizes data into some statistics, such as the mean, median, range, standard deviation... visualize features of data by histogram, pie chart, boxplot... Jimin Ding, Math WUSTL Math 494 Spring 2018 4 / 44

Statistical Analysis There are two ways of analyzing data: Descriptive data analysis (explanatory data analysis) summarizes data into some statistics, such as the mean, median, range, standard deviation... visualize features of data by histogram, pie chart, boxplot... Inferential Statistical Analysis assumes a probability model on the collected data (sample) investigates features of the distribution or a family of distributions to infer the features of the entire group of individuals (population) that may be impossible or too expansive to exam. Jimin Ding, Math WUSTL Math 494 Spring 2018 4 / 44

Statistical Analysis There are two ways of analyzing data: Descriptive data analysis (explanatory data analysis) summarizes data into some statistics, such as the mean, median, range, standard deviation... visualize features of data by histogram, pie chart, boxplot... Inferential Statistical Analysis assumes a probability model on the collected data (sample) investigates features of the distribution or a family of distributions to infer the features of the entire group of individuals (population) that may be impossible or too expansive to exam. We will focus on the later one in this course. Jimin Ding, Math WUSTL Math 494 Spring 2018 4 / 44

Example 1: Quality Control Consider a population of N elements, for instance, a shipment of manufactured items. An unknown number N θ of these elements are defective. It will be expensive to exam all items for large N (or impossible if the inspection is destructive). To learn θ, one may randomly draw a sample of n without replacement and inspect. (Assume all items have the same probability to be defective.) Population: Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 5 / 44

Example 1: Quality Control Consider a population of N elements, for instance, a shipment of manufactured items. An unknown number N θ of these elements are defective. It will be expensive to exam all items for large N (or impossible if the inspection is destructive). To learn θ, one may randomly draw a sample of n without replacement and inspect. (Assume all items have the same probability to be defective.) Population: N items, N θ defective items Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 5 / 44

Example 1: Quality Control Consider a population of N elements, for instance, a shipment of manufactured items. An unknown number N θ of these elements are defective. It will be expensive to exam all items for large N (or impossible if the inspection is destructive). To learn θ, one may randomly draw a sample of n without replacement and inspect. (Assume all items have the same probability to be defective.) Population: N items, N θ defective items Sample (data): n sampled items, X defective items in sample Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 5 / 44

Example 1: Quality Control Consider a population of N elements, for instance, a shipment of manufactured items. An unknown number N θ of these elements are defective. It will be expensive to exam all items for large N (or impossible if the inspection is destructive). To learn θ, one may randomly draw a sample of n without replacement and inspect. (Assume all items have the same probability to be defective.) Population: N items, N θ defective items Sample (data): n sampled items, X defective items in sample Probability model: P r(x = k) = ( Nθ k )( N Nθ n k ) ( N n), k = 0, 1,, min(nθ, n). This is the hypergeometric distribution, H(N θ, N, n), which is a family of distributions indexed by θ. Here θ is the unknown parameter that we want to estimate. Jimin Ding, Math WUSTL Math 494 Spring 2018 5 / 44

Example 2: Measurement Problem An experimenter makes n independent determinations of a physical constant µ. Population: Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 6 / 44

Example 2: Measurement Problem An experimenter makes n independent determinations of a physical constant µ. Population: all measurements for physical experiments. Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 6 / 44

Example 2: Measurement Problem An experimenter makes n independent determinations of a physical constant µ. Population: all measurements for physical experiments. Sample (data): Observed measurements X 1,, X n from recorded experiments, subject to measurement errors. Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 6 / 44

Example 2: Measurement Problem An experimenter makes n independent determinations of a physical constant µ. Population: all measurements for physical experiments. Sample (data): Observed measurements X 1,, X n from recorded experiments, subject to measurement errors. Probability model: X i = µ + ɛ i, 1 i n. Jimin Ding, Math WUSTL Math 494 Spring 2018 6 / 44

Example 2: Measurement Problem An experimenter makes n independent determinations of a physical constant µ. Population: all measurements for physical experiments. Sample (data): Observed measurements X 1,, X n from recorded experiments, subject to measurement errors. Probability model: X i = µ + ɛ i, 1 i n. Different distribution assumptions can be considered for ɛ. 1. ɛ i are independent. 2. ɛ i are identically distributed 3. the distribution of ɛ i does not depend on µ. 4. (1) ɛ i iid N(0, σ 2 ). Here θ = (µ, σ 2 ) Θ, Θ = R R + R 2. (2) ɛ i iid F, where F is a cdf with expectation 0 and finite variance σ 2. Jimin Ding, Math WUSTL Math 494 Spring 2018 6 / 44

Example 3: Treatment Effect To compare the treatment effect of drug A and B for treating a given disease, m subjects were assign to drug A and n were assign to drug B. Let X 1,..., X m be responses of m subjects receiving drug A, and Y 1,..., Y n be responses of n subjects receiving drug B. If drug A is placebo, then X 1,..., X n are referred to as control observations, Y 1,..., Y n are treatment observations. Population: Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 7 / 44

Example 3: Treatment Effect To compare the treatment effect of drug A and B for treating a given disease, m subjects were assign to drug A and n were assign to drug B. Let X 1,..., X m be responses of m subjects receiving drug A, and Y 1,..., Y n be responses of n subjects receiving drug B. If drug A is placebo, then X 1,..., X n are referred to as control observations, Y 1,..., Y n are treatment observations. Population: Responses of all subjects receiving drug A and B. Sample (data): Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 7 / 44

Example 3: Treatment Effect To compare the treatment effect of drug A and B for treating a given disease, m subjects were assign to drug A and n were assign to drug B. Let X 1,..., X m be responses of m subjects receiving drug A, and Y 1,..., Y n be responses of n subjects receiving drug B. If drug A is placebo, then X 1,..., X n are referred to as control observations, Y 1,..., Y n are treatment observations. Population: Responses of all subjects receiving drug A and B. Sample (data): X 1,..., X m for A, Y 1,..., Y n for B. Probability model: Jimin Ding, Math WUSTL Math 494 Spring 2018 7 / 44

Example 3: Treatment Effect To compare the treatment effect of drug A and B for treating a given disease, m subjects were assign to drug A and n were assign to drug B. Let X 1,..., X m be responses of m subjects receiving drug A, and Y 1,..., Y n be responses of n subjects receiving drug B. If drug A is placebo, then X 1,..., X n are referred to as control observations, Y 1,..., Y n are treatment observations. Population: Responses of all subjects receiving drug A and B. Sample (data): X 1,..., X m for A, Y 1,..., Y n for B. Probability model: Different models can be considered: 1. X i iid F, Yj iid G, then the model is (F, G) 2. F N(µ, σ 2 ) and G N(µ +, σ 2 ), then the parameter is (, µ, σ 2 ). Jimin Ding, Math WUSTL Math 494 Spring 2018 7 / 44

Two Types of Questions iid Given X 1,, X n F or f, (referred to as a random sample of size n from the distribution F (or f) ), how do we infer F or f? F = {f(x) :f > 0, f(x)dx = 1} or F = {F (x) :F ( ) = 0, F (+ ) = 1, R F is monotone increasing and right continuous.} Jimin Ding, Math WUSTL Math 494 Spring 2018 8 / 44

Two Types of Questions iid Given X 1,, X n F or f, (referred to as a random sample of size n from the distribution F (or f) ), how do we infer F or f? F = {f(x) :f > 0, f(x)dx = 1} or F = {F (x) :F ( ) = 0, F (+ ) = 1, R F is monotone increasing and right continuous.} Given X 1,, X n iid F (x; θ) or f(x; θ), how do we infer θ? F = {f(x; θ) : θ Θ} (where θ is an unknown parameter take can take values in the parameter space Θ.) Jimin Ding, Math WUSTL Math 494 Spring 2018 8 / 44

Two Types of Questions iid Given X 1,, X n F or f, (referred to as a random sample of size n from the distribution F (or f) ), how do we infer F or f? F = {f(x) :f > 0, f(x)dx = 1} or F = {F (x) :F ( ) = 0, F (+ ) = 1, Nonparametric model R F is monotone increasing and right continuous.} Given X 1,, X n iid F (x; θ) or f(x; θ), how do we infer θ? F = {f(x; θ) : θ Θ} (where θ is an unknown parameter take can take values in the parameter space Θ.) Parametric model Jimin Ding, Math WUSTL Math 494 Spring 2018 8 / 44

Type of Statistical Models Nonparametric model: goal is F or f Parametric model: goal is θ Jimin Ding, Math WUSTL Math 494 Spring 2018 9 / 44

Type of Statistical Models Nonparametric model: goal is F or f Parametric model: goal is θ Since the second question has a smaller space of candidates, we first start with question 2 (simpler). Jimin Ding, Math WUSTL Math 494 Spring 2018 9 / 44

Type of Statistical Models Nonparametric model: goal is F or f Parametric model: goal is θ Since the second question has a smaller space of candidates, we first start with question 2 (simpler). Sometimes, θ is a vector that we are only interested in one (or some) component of θ. In this case, the remaining parameters are referred to as nuisance parameter. Jimin Ding, Math WUSTL Math 494 Spring 2018 9 / 44

Type of Statistical Models Nonparametric model: goal is F or f Parametric model: goal is θ Since the second question has a smaller space of candidates, we first start with question 2 (simpler). Sometimes, θ is a vector that we are only interested in one (or some) component of θ. In this case, the remaining parameters are referred to as nuisance parameter. Semiparametric model is a combination of parametric and nonparametric model and the goal is f(x; θ, g) which is only partially specified by the parameter θ, and g is a unspecified function (infinite-dimension). Jimin Ding, Math WUSTL Math 494 Spring 2018 9 / 44

What is Statistical Inference? Statistical inference is the process of using data to infer the distribution that generated data. Jimin Ding, Math WUSTL Math 494 Spring 2018 10 / 44

What is Statistical Inference? Statistical inference is the process of using data to infer the distribution that generated data. Three components of statistical inference: Point Estimation Confidence Interval Hypothesis Testing Jimin Ding, Math WUSTL Math 494 Spring 2018 10 / 44

Point Estimation Jimin Ding, Math WUSTL Math 494 Spring 2018 11 / 44

Point Estimation Parameter of Interest: A fixed and unknown population parameter, θ, or a function of model parameter, g(θ). Eg: population mean µ, population standard deviation σ,... Point Estimator: A statistic from a sample that is used to estimate the parameter of interest, ˆθ (which is a r.v.). Eg: sample mean X = i X i/n for µ, (X sample standard deviation S = i X) 2 i (n 1) for σ. Estimate: The numerical value of an estimator in an observed sample. Eg: observed sample mean x = i x i/n for µ, (x observed sample standard deviation s = i x) 2 i (n 1) for σ. Point estimation: the process of providing a point estimator. Jimin Ding, Math WUSTL Math 494 Spring 2018 12 / 44

Example 1: Poisson Model (Ex 4.1.3) Suppose the number of customers X that enter a store during 9-10am. follows a Poisson distribution with parameter θ. Suppose a random sample of the number of customers that enter the store during 9-10am. for 10 days results in the values: What is a good guess of θ? 9, 7, 9, 15, 10, 13, 11, 7, 2, 12. go back to CI Jimin Ding, Math WUSTL Math 494 Spring 2018 13 / 44

Maximum Likelihood Estimation (MLE) The observed values in a random sample, x 1, x 2,, x n, can be called as realizations of the sample. Likelihood function: joint pdf of the realizations (observed data) L(θ) =f 1 (x 1 ; θ)f 2 (x 2 ; θ) f n (x n ; θ) =Π n i=1f(x i ; θ) (if iid) The larger L(θ) is, the more possible that we observe these realizations x 1, x 2,, x n. The θ that maximizes the likelihood function, L(θ), is referred to as maximum likelihood estimator (MLE). ˆθ = arg max θ Θ L(θ) = arg max log L(θ) θ Θ Jimin Ding, Math WUSTL Math 494 Spring 2018 14 / 44

Example 2: Normal Model (Example 4.1.3) Jimin Ding, Math WUSTL Math 494 Spring 2018 15 / 44

Example 3: Uniform Distribution (Example 4.1.4) Jimin Ding, Math WUSTL Math 494 Spring 2018 16 / 44

Properties of MLE Plug-In Theorem: If ˆθ is the MLE of θ, then g(ˆθ) is the MLE of g(θ). The MLE is consistent. The MLE is asymptotically unbiased. The MLE is efficient. The MLE is often asymptotically normal. The MLE is equivalent to LSE in regression under normality assumption. Jimin Ding, Math WUSTL Math 494 Spring 2018 17 / 44

Nonparametric MLE: for CDF Empirical CDF for F ˆF X (x) = 1 n n 1(X i x). i=1 This stepwise function is called the empirical CDF. It is an unbiased estimator of CDF. Jimin Ding, Math WUSTL Math 494 Spring 2018 18 / 44

Nonparametric MLE: for CDF Empirical CDF for F ˆF X (x) = 1 n n 1(X i x). i=1 This stepwise function is called the empirical CDF. It is an unbiased estimator of CDF. Because it can be viewed as an average of Bernoulli random variables, it follows LLN as n. It can be proved to be a n-consistent estimator of CDF, and it is asymptotically normal. It can be further proved it is a uniformly consistent estimator of CDF. Jimin Ding, Math WUSTL Math 494 Spring 2018 18 / 44

Nonparametric MLE: for PDF or PMF Discrete case: If the sample space (possible outcome of X) is finite, one may list all possible values {a 1, a 2,, a m }, then define ˆf(a j ) = 1 n n 1(X i = a j ), for j = 1,, m i=1 If the sample space is infinite, one may select some higher possible values and group the others {a 1, a 2,, a m, ã m+1 }, where ã m+1 = {a m+1, a m+2, } Rule of thumb: select m s.t. ˆf(am ) > ˆf(ã m+1 ) Continuous case: For a given x, consider (x h, x + h) for some h > 0, Jimin Ding, Math WUSTL Math 494 Spring 2018 19 / 44

Nonparametric MLE: for PDF or PMF Discrete case: If the sample space (possible outcome of X) is finite, one may list all possible values {a 1, a 2,, a m }, then define ˆf(a j ) = 1 n n 1(X i = a j ), for j = 1,, m i=1 If the sample space is infinite, one may select some higher possible values and group the others {a 1, a 2,, a m, ã m+1 }, where ã m+1 = {a m+1, a m+2, } Rule of thumb: select m s.t. ˆf(am ) > ˆf(ã m+1 ) Continuous case: For a given x, consider (x h, x + h) for some h > 0, ˆf(x) = n i=1 1{X i (x h, x + h)} 2hn Jimin Ding, Math WUSTL Math 494 Spring 2018 19 / 44

Confidence Interval Jimin Ding, Math WUSTL Math 494 Spring 2018 20 / 44

Motivation for Confidence Interval In the previous Normal Model example, ˆµ ML = X maximizes the likelihood of observing the recorded data. But P (ˆµ = µ) =? Jimin Ding, Math WUSTL Math 494 Spring 2018 21 / 44

Motivation for Confidence Interval In the previous Normal Model example, ˆµ ML = X maximizes the likelihood of observing the recorded data. But P (ˆµ = µ) =? So we need an estimate of estimation error. In this example, we want to estimate SD(ˆµ) or V ar(ˆµ), to understand how much did ˆµ miss µ. In general, let θ be the parameter of interest and ˆθ be an estimator of θ. We call an estimator of the stardard deviation of ˆθ as standard error of ˆθ, and denoted by se(ˆθ). se(ˆθ) := ˆ SD(ˆθ). Jimin Ding, Math WUSTL Math 494 Spring 2018 21 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: we want intervals to 1. have small length Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: we want intervals to 1. have small length and 2.have a high chance to capture the true parameter. Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: we want intervals to 1. have small length and 2.have a high chance to capture the true parameter. When we increase the interval we are always more confident that true parameter is inside it because we ve include more values. Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: we want intervals to 1. have small length and 2.have a high chance to capture the true parameter. When we increase the interval we are always more confident that true parameter is inside it because we ve include more values. Confidence Level/Coefficient: the likelihood of an interval estimate to capture the population parameter of interest, often denoted by 1 α, for some α (0, 1). Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

Confidence Interval Interval Estimate: interval bounded by two values and used to estimate the population parameter of interest. (The bound could be infinity.) Some intervals are more useful than others: we want intervals to 1. have small length and 2.have a high chance to capture the true parameter. When we increase the interval we are always more confident that true parameter is inside it because we ve include more values. Confidence Level/Coefficient: the likelihood of an interval estimate to capture the population parameter of interest, often denoted by 1 α, for some α (0, 1). Confidence Interval (CI) Estimate: an interval estimate (L, U) that has a specified (and justified) level of confidence, say 100(1 α)%, where α [0, 1]. 1 α = P (θ (L, U)) Jimin Ding, Math WUSTL Math 494 Spring 2018 22 / 44

How to Interpret Confidence Intervals Since sample is random, so are confidence intervals. Just like an estimator. If I collect many different samples and so have many different 100(1 α)% CIs, then I expect 100(1 α)% of these intervals to capture the true parameter. Practical interpretation: I am 100(1 α)% confident that the interval contains/captures the population parameter (mean, variance,median...). The interpretation rests on the idea of repeated samples. Just like the sampling distributions. Note: once the sample is drawn, the realized value of the confidence interval is (l, u), an interval of real numbers, which is not random any longer. Jimin Ding, Math WUSTL Math 494 Spring 2018 23 / 44

How to Interpret Confidence Intervals Since sample is random, so are confidence intervals. Just like an estimator. If I collect many different samples and so have many different 100(1 α)% CIs, then I expect 100(1 α)% of these intervals to capture the true parameter. Practical interpretation: I am 100(1 α)% confident that the interval contains/captures the population parameter (mean, variance,median...). The interpretation rests on the idea of repeated samples. Just like the sampling distributions. Note: once the sample is drawn, the realized value of the confidence interval is (l, u), an interval of real numbers, which is not random any longer. It either contains θ or not. Jimin Ding, Math WUSTL Math 494 Spring 2018 23 / 44

Example: CI for µ Under normality Case 1: σ 2 is known. Case 2: σ 2 is unknown. (Example 4.2.1) Jimin Ding, Math WUSTL Math 494 Spring 2018 24 / 44

Pivotal Statistics In the construction of above CIs, we have used the statistics Z = X µ σ/ X µ and T = n S/. These statistics are called pivotal n statistics. Jimin Ding, Math WUSTL Math 494 Spring 2018 25 / 44

Pivotal Statistics In the construction of above CIs, we have used the statistics Z = X µ σ/ X µ and T = n S/. These statistics are called pivotal n statistics. They are random variables with the following two properties I. A function of the unknown parameter of interest, µ, but contains no other unknown parameters. II. The distribution of a pivotal statistic is known and free of any unknown parameters. Jimin Ding, Math WUSTL Math 494 Spring 2018 25 / 44

Pivotal Statistics In the construction of above CIs, we have used the statistics Z = X µ σ/ X µ and T = n S/. These statistics are called pivotal n statistics. They are random variables with the following two properties I. A function of the unknown parameter of interest, µ, but contains no other unknown parameters. II. The distribution of a pivotal statistic is known and free of any unknown parameters. It is a common technique to use pivotal statistics to construct confidence intervals and other statistical inference procedures in statistics. Jimin Ding, Math WUSTL Math 494 Spring 2018 25 / 44

Example: CI for µ Without normality Recall CLT: Let X 1, X 2,, X n iid with µ and σ 2 <. Then X µ σ/ n D N(0, 1), n. The result still hold if σ is replaced by a consistent estimator ˆσ (to be proved in Chapter 5), such as sample standard deviation ˆσ = S. Case 1: large sample (Example 4.2.2) Jimin Ding, Math WUSTL Math 494 Spring 2018 26 / 44

Example: CI for µ Without normality Case 2: large sample of Bernoulli rv (Example 4.2.3) Jimin Ding, Math WUSTL Math 494 Spring 2018 27 / 44

Example: CI for µ Without normality Case 3: small sample (Poisson Model) Jimin Ding, Math WUSTL Math 494 Spring 2018 28 / 44

Example: CI for µ 1 µ 2 Case 1: large sample Jimin Ding, Math WUSTL Math 494 Spring 2018 29 / 44

Example: CI for µ 1 µ 2 Case 2: under normality (Example 4.2.4) Jimin Ding, Math WUSTL Math 494 Spring 2018 30 / 44

Example: CI for p 1 p 2 in Bernoulli/Binomial Jimin Ding, Math WUSTL Math 494 Spring 2018 31 / 44

Hypothesis Testing Jimin Ding, Math WUSTL Math 494 Spring 2018 32 / 44

Motivation Estimation was concerned with taking the data and making guesses about what the parameter could be. What if we first guess what the parameter is, and decide whether the data support that guess? Our initial guess is called the null hypothesis, denoted by H 0. The alternative hypothesis, denoted by H 1, is the opposite of the null hypothesis. Hypothesis testing: a process to make a decision based data between two opposing hypotheses. Statistical hypotheses are certain explicit statements about population parameters. Jimin Ding, Math WUSTL Math 494 Spring 2018 33 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. I. State null, H 0, and alternative, H 1, hypotheses. H 0 : neutral / no difference / no effect H 1 : a contradictory claim to H 0. It depends on the questions we want to answer. Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. I. State null, H 0, and alternative, H 1, hypotheses. H 0 : neutral / no difference / no effect H 1 : a contradictory claim to H 0. It depends on the questions we want to answer. II. Specify the significance level: pre-determined a small positive number to bound p-value, denoted by α Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. I. State null, H 0, and alternative, H 1, hypotheses. H 0 : neutral / no difference / no effect H 1 : a contradictory claim to H 0. It depends on the questions we want to answer. II. Specify the significance level: pre-determined a small positive number to bound p-value, denoted by α III. Computer test statistic: numerical summary of data which measures the deviation between H 0 and H 1, usually a pivotal statistic. Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. I. State null, H 0, and alternative, H 1, hypotheses. H 0 : neutral / no difference / no effect H 1 : a contradictory claim to H 0. It depends on the questions we want to answer. II. Specify the significance level: pre-determined a small positive number to bound p-value, denoted by α III. Computer test statistic: numerical summary of data which measures the deviation between H 0 and H 1, usually a pivotal statistic. IV. Derive rejection region, or find p-value which is the probability of rejecting H 0 when it is actually true. Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Hypothesis Testing Hypothesis testing is an inference tool to use sample data to examine whether the statistical hypotheses are true. It is usually done in the following 5 steps. I. State null, H 0, and alternative, H 1, hypotheses. H 0 : neutral / no difference / no effect H 1 : a contradictory claim to H 0. It depends on the questions we want to answer. II. Specify the significance level: pre-determined a small positive number to bound p-value, denoted by α III. Computer test statistic: numerical summary of data which measures the deviation between H 0 and H 1, usually a pivotal statistic. IV. Derive rejection region, or find p-value which is the probability of rejecting H 0 when it is actually true. V. Draw conclusions: reject H 0 if p-value is smaller than α or observed data fall in to rejection region. Jimin Ding, Math WUSTL Math 494 Spring 2018 34 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) Decision: Reject H 0 Decision: Fail to Reject H 0 (Accept H 0 ) H 0 is true H 1 is true Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) Decision: Reject H 0 H 0 is true H 1 is true Correct Decision Decision: Fail to Reject H 0 (Accept H 0 ) Correct Decision Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Decision: Type I error Correct Decision Reject H 0 Decision: Correct Decision Type II error Fail to Reject H 0 (Accept H 0 ) Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Decision: Type I error Correct Decision Reject H 0 Probability=α Decision: Correct Decision Type II error Fail to Reject H 0 Probability=β (Accept H 0 ) Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Decision: Type I error Correct Decision Reject H 0 Probability=α Power (1 β) Decision: Correct Decision Type II error Fail to Reject H 0 Probability= 1 α Probability=β (Accept H 0 ) Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Decision: Type I error Correct Decision Reject H 0 Probability=α Power (1 β) (False Positive) Decision: Correct Decision Type II error Fail to Reject H 0 Probability= 1 α Probability=β (Accept H 0 ) (False Negative) Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decisions in Hypothesis Testing Let X 1,, X n iid f(x; θ), θ Θ. Consider to test H 0 : θ Θ 0, v.s. H 1 : θ Θ 1, where Θ 0 Θ 1 = Θ and Θ 0 Θ 1 = Ø. There are two types of wrong decisions in hypothesis testing. When H 0 is true, we reject H 0. (False positive, Type I error) When H 1 is true, we fail to reject H 0. (False negative, Type II error) H 0 is true H 1 is true Decision: Type I error Correct Decision Reject H 0 Probability=α Power (1 β) (False Positive) (Correct Positive) Decision: Correct Decision Type II error Fail to Reject H 0 Probability= 1 α Probability=β (Accept H 0 ) (Correct Negative) (False Negative) Jimin Ding, Math WUSTL Math 494 Spring 2018 35 / 44

Decision Rule If (X 1,, X n ) C, then reject H 0 ; (rejection/critical region) If (X 1,, X n ) C c, then fail to reject H 0. (acceptance region) Goal: select C so that the probability of making errors are minimized. Typically we consider Type I error is more serious. So we first bound the probability of Type I error and then minimize the probability of Type II error. The probability of making type II error is denoted by β. Power: The probability of making a correct rejection, 1 β. Generally, the probability of making Type I error increases as the probability of making Type II error decreases. Jimin Ding, Math WUSTL Math 494 Spring 2018 36 / 44

Size of the Test and Power We say a rejection/critical region C is of size α if α = max θ Θ 0 P θ {(X 1,, X n ) C}, which is the upper bound of the probability of false rejection. Furthermore, power is defined as power = P θ {(X 1,, X n ) C c }, θ Θ 1, which is 1 probability of making Type II error. Jimin Ding, Math WUSTL Math 494 Spring 2018 37 / 44

Size of the Test and Power We say a rejection/critical region C is of size α if α = max θ Θ 0 P θ {(X 1,, X n ) C}, which is the upper bound of the probability of false rejection. Furthermore, power is defined as power = P θ {(X 1,, X n ) C c }, θ Θ 1, which is 1 probability of making Type II error. We see the power of a test depends on its critical region (rule). Denote the power function by r C (θ) = P θ {(X 1,, X n ) C c }, θ Θ. Jimin Ding, Math WUSTL Math 494 Spring 2018 37 / 44

Size of the Test and Power We say a rejection/critical region C is of size α if α = max θ Θ 0 P θ {(X 1,, X n ) C}, which is the upper bound of the probability of false rejection. Furthermore, power is defined as power = P θ {(X 1,, X n ) C c }, θ Θ 1, which is 1 probability of making Type II error. We see the power of a test depends on its critical region (rule). Denote the power function by r C (θ) = P θ {(X 1,, X n ) C c }, θ Θ. Given two critical regions, C 1 and C 2, which are both of size of α, we claim C 1 is better than C 2 if r C1 (θ) r C2 (θ), θ Θ 1. Jimin Ding, Math WUSTL Math 494 Spring 2018 37 / 44

Example 1: Binomial Model Jimin Ding, Math WUSTL Math 494 Spring 2018 38 / 44

Jimin Ding, Math WUSTL Math 494 Spring 2018 39 / 44

Example 2: Normal Model Jimin Ding, Math WUSTL Math 494 Spring 2018 40 / 44

P-value So far, we have derived decision rules and rejection regions using α. We see they depend on the choice of significance level α. But no data information, except sample size, was used in decision rules. Jimin Ding, Math WUSTL Math 494 Spring 2018 41 / 44

P-value So far, we have derived decision rules and rejection regions using α. We see they depend on the choice of significance level α. But no data information, except sample size, was used in decision rules. In practice, however, the data are often already collected, and one may not know a good choice of α in advance, or want to know all decision rules for a set values of α. For example, the observed sample mean is given x = 5, should we reject H 0 for α = 0.01, or α = 0.05, or α = 0.1... Jimin Ding, Math WUSTL Math 494 Spring 2018 41 / 44

P-value So far, we have derived decision rules and rejection regions using α. We see they depend on the choice of significance level α. But no data information, except sample size, was used in decision rules. In practice, however, the data are often already collected, and one may not know a good choice of α in advance, or want to know all decision rules for a set values of α. For example, the observed sample mean is given x = 5, should we reject H 0 for α = 0.01, or α = 0.05, or α = 0.1... In this case, a p-value instead of a rejection region is often used to provide more information. Precisely, p-value is, given that the H 0 is true, the conditional probability of observing more extreme data in the direction of H 1. The p-value can be viewed as an observed significance level. Jimin Ding, Math WUSTL Math 494 Spring 2018 41 / 44

Example: Zea mays Growth (Example 4.5.1,4.5.5) In 1878, Darwin recorded the heights of Zea mays plants to see the difference between cross- and self- fertilization. On each of the 15 plots, one cross-fertilized plant and one self-fertilized plant were planted to grow and then measured. The height difference of the cross-fertilized plant and the self-fertilized plant in the same plot is recorded. For 15 plots, the mean is x = 2.62 and standard deviation is s = 4.72. Assume the difference in height is independent, and normally distributed. Is cross-fertilized plant taller than self-fertilized plant? Population Sample Probability model Jimin Ding, Math WUSTL Math 494 Spring 2018 42 / 44

Example: Zea mays Growth (Example 4.5.1,4.5.5) In 1878, Darwin recorded the heights of Zea mays plants to see the difference between cross- and self- fertilization. On each of the 15 plots, one cross-fertilized plant and one self-fertilized plant were planted to grow and then measured. The height difference of the cross-fertilized plant and the self-fertilized plant in the same plot is recorded. For 15 plots, the mean is x = 2.62 and standard deviation is s = 4.72. Assume the difference in height is independent, and normally distributed. Is cross-fertilized plant taller than self-fertilized plant? Population Sample Probability model Parameter of interest Null and alternative hypotheses Test statistics P-value Conclusion Jimin Ding, Math WUSTL Math 494 Spring 2018 42 / 44

Example 3: F-test for Variances Recall in testing two sample means, to improve power, we assumed the variances of two normal samples were same. Is this assumption supported by data? Can we test the assumption formally? Jimin Ding, Math WUSTL Math 494 Spring 2018 43 / 44

Example 3: F-test for Variances Recall in testing two sample means, to improve power, we assumed the variances of two normal samples were same. Is this assumption supported by data? Can we test the assumption formally? Simpson, Olsen, and Eden (1975) describe an experiment, in which a random sample of 26 clouds were seeded with silver nitrate to see if they produced more rain than unseeded clouds. Suppose that on a log scale, the rainfall in both samples is approximated normal. And the sample means and sample variances are x = 5.13, ȳ = 3.99, s 2 x = 63.96, s 2 y = 67.39. Jimin Ding, Math WUSTL Math 494 Spring 2018 43 / 44

Level, Power, P-value of F-test in One-sided Test Consider H 0 : σ 2 1 σ 2 2 v.s. H 1 : σ 2 1 > σ 2 2. Decision rule: reject H 0 in favor of H 1 if S 2 x/s 2 y c s.t. α = max P (Sx/S 2 σ1 2 y 2 c σ1, 2 σ2). 2,σ2 2 Jimin Ding, Math WUSTL Math 494 Spring 2018 44 / 44

Level, Power, P-value of F-test in One-sided Test Consider H 0 : σ 2 1 σ 2 2 v.s. H 1 : σ 2 1 > σ 2 2. Decision rule: reject H 0 in favor of H 1 if S 2 x/s 2 y c s.t. α = max P (Sx/S 2 σ1 2 y 2 c σ1, 2 σ2). 2,σ2 2 Proposition: Let V = S 2 x/s 2 y, c be the α upper tail quantile of F n 1,m 1, and G n 1,m 1 be the c.d.f. of F n 1,m 1 distribution. (That is, 1 α = G(c)). Then the power function of the test P (V c) satisfies: 1. P (V c) = 1 G n 1,m 1 (σ 2 2 /σ2 1 c). 2. P (V c) = α, if σ 2 1 = σ2 2. 3. P (V c) < α, if σ 2 1 < σ2 2. 4. P (V c) > α, if σ 2 1 > σ2 2. 5. P (V c) 0, if σ 2 1 /σ2 2 0. 6. P (V c) 1, if σ 2 1 /σ2 2. Jimin Ding, Math WUSTL Math 494 Spring 2018 44 / 44