Parameter Estimation, Sampling Distributions & Hypothesis Testing

Similar documents
Fundamental Probability and Statistics

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between


Making Inferences About Parameters

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Sampling Distributions: Central Limit Theorem

3. (a) (8 points) There is more than one way to correctly express the null hypothesis in matrix form. One way to state the null hypothesis is

Hypothesis testing (cont d)

Lecture 8: Information Theory and Statistics

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

POLI 443 Applied Political Research

Probability and Statistics

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Introductory Econometrics. Review of statistics (Part II: Inference)

Basic Concepts of Inference

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

Background to Statistics

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

Topic 15: Simple Hypotheses

Chapter 5: HYPOTHESIS TESTING

Statistical inference

Evaluation. Andrea Passerini Machine Learning. Evaluation

How do we compare the relative performance among competing models?

Ch. 5 Hypothesis Testing

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

PSYC 331 STATISTICS FOR PSYCHOLOGIST

Evaluation requires to define performance measures to be optimized

Mathematical Statistics

Detection theory. H 0 : x[n] = w[n]

The problem of base rates

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Applied Statistics for the Behavioral Sciences

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

6.4 Type I and Type II Errors

Summary of Chapters 7-9

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Lecture 7 Introduction to Statistical Decision Theory

Stat 5421 Lecture Notes Fuzzy P-Values and Confidence Intervals Charles J. Geyer March 12, Discreteness versus Hypothesis Tests

An inferential procedure to use sample data to understand a population Procedures

Relating Graph to Matlab

Hypothesis testing: Steps

Sampling Distributions

Review. December 4 th, Review

20 Hypothesis Testing, Part I

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

HANDBOOK OF APPLICABLE MATHEMATICS

Psychology 282 Lecture #4 Outline Inferences in SLR

The Purpose of Hypothesis Testing

STAT 830 Hypothesis Testing

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

F79SM STATISTICAL METHODS

Sampling Distributions

CHAPTER 8. Test Procedures is a rule, based on sample data, for deciding whether to reject H 0 and contains:

Chapter 9 Inferences from Two Samples

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Keppel, G. & Wickens, T.D. Design and Analysis Chapter 2: Sources of Variability and Sums of Squares

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Introduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33

PSY 305. Module 3. Page Title. Introduction to Hypothesis Testing Z-tests. Five steps in hypothesis testing

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

Quantitative Methods for Economics, Finance and Management (A86050 F86050)

Finansiell Statistik, GN, 15 hp, VT2008 Lecture 10-11: Statistical Inference: Hypothesis Testing

Rigorous Science - Based on a probability value? The linkage between Popperian science and statistical analysis

HYPOTHESIS TESTING. Hypothesis Testing

KDF2C QUANTITATIVE TECHNIQUES FOR BUSINESSDECISION. Unit : I - V

Master s Written Examination

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Introduction to the Analysis of Variance (ANOVA) Computing One-Way Independent Measures (Between Subjects) ANOVAs

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

CH.9 Tests of Hypotheses for a Single Sample

Elementary Statistics Triola, Elementary Statistics 11/e Unit 17 The Basics of Hypotheses Testing

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

TUTORIAL 8 SOLUTIONS #

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Visual interpretation with normal approximation

Study Ch. 9.3, #47 53 (45 51), 55 61, (55 59)

Institute of Actuaries of India

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Single Sample Means. SOCY601 Alan Neustadtl

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

Importance Sampling and. Radon-Nikodym Derivatives. Steven R. Dunbar. Sampling with respect to 2 distributions. Rare Event Simulation

Practice Problems Section Problems

hypothesis a claim about the value of some parameter (like p)

MATH 240. Chapter 8 Outlines of Hypothesis Tests

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Hypotheses Test Procedures. Is the claim wrong?

Statistics: revision

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Statistical. Psychology

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Lecture 8: Information Theory and Statistics

Transcription:

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Parameter Estimation & Hypothesis Testing In doing research, we are usually interested in some feature of a population distribution (which can be described using population parameters) Since populations are difficult (or impossible) to collect data on, we estimate population parameters using point estimates based on sample statistics Sample statistics vary from sample to sample, making point estimates variable and unreliable The distribution of a statistic (estimate) computed across many different samples is called the sampling distribution of that statistic (estimate) We can use the sampling distribution to estimate the likelihood associated with a hypothesized population parameter or the margin of error (or confidence interval) associated with a point estimate 2

characterized by population parameters characterized by sample statistics 3

Law of Large Numbers Mean Sample Age vs. Size n 1 Let Xn = Xi, n i then ( n ) lim P X µ < ε = 1 n 4

Sampling Distributions How reliable are sample statistics (as estimators) for a finite sample size? 5

Central Limit Theorem Thanks to the central limit theorem we can compute the sampling distribution of the mean without having to actually draw samples and compute sample means. Central limit theorem: Given a population with mean µ and standard deviation σ, the sampling distribution of the mean (i.e., the distribution of sample means) will itself have a mean of µ and a standard deviation (standard error) of σ / n Furthermore, whatever the distribution of the parent population, this sampling distribution will approach the normal distribution as the sample size (n) increases. 6

Standard Error Just as the standard deviation (σ)of a population of scores provides a measure of the average distance between an individual score (x) and the population mean (µ), the standard error (σ X ) provides a measure of the average distance between the sample mean (X ) and the population mean (µ). σ = X σ n 7

Hypothesis Testing Procedure for traditional (NHST) hypothesis testing Roots Significance testing: (Karl) Pearson & Fisher Decision-theoretic hypothesis testing: Neyman & (Egon) Pearson Logic of the individual and combined approaches 8

Traditional (NHST) Hypothesis Testing 1. Begin with a research hypothesis H 1 (defined in terms of population parameters) 2. Set up the null hypothesis H 0 3. Construct the sampling distribution of a particular statistic under the assumption that the null hypothesis is true 4. Collect some data and use it to compute a sample statistic 5. Compare the sample statistic to the distribution constructed in step (3) 6. Reject or retain H 0 depending on the probability, under H 0, of obtaining a sample statistic as extreme as the one we observed 9

Roots: Inferential Significance Testing Significance Testing, as conceived by Fisher (and Karl Pearson) was conceived as a heuristic for building an inductive case for or against a particular model Pearson (1900) conceived of p (essentially equivalent to a modern twotailed p-value) as an index of the validity of a hypothesis. He later (1914) popularizes this index by publishing tables of this value for a number of standard Fisher (1925) suggests using p = 0.05 (or some smaller value) as a heuristic to determine whether to further consider the results of an experiment The ideas of the p-value, of the null hypothesis, and of significance come from this approach 10

Roots: Decision-Theoretic Hypothesis Testing Hypothesis testing was conceived by Jerzy Neyman and Egon Pearson (Karl s son) as an efficient and objective alternative to significance testing Neyman & Pearson (1933) write an abstract paper investigating an optimal long-run strategy for testing pairs of hypotheses. They suggest comparing the log likelihood ratio of each hypothesis to a criterion computed from a fixed tail probability of incorrectly classifying one of the two hypotheses The concepts of Type I and Type II errors, α, β, power, critical regions, and fixed-criterion hypothesis testing all come from this approach 11

Differences Between the Approaches Fisher Set up a statistical null hypothesis (must be exact) Report the exact level of significance (p) If the result is not significant, draw no conclusions. Only use this procedure to draw provisional conclusions Neyman Pearson Set up two statistical hypotheses (H 0 & H 1 ), both of which must be exact Decide on α, β, and sample size before the experiment, these will define a rejection region If the data fall into the rejection region of H 0, accept H 1, otherwise accept H 0. Always make a decision based on the available information 12

Hypothesis Testing & The Null Hypothesis Why do we test the null hypothesis H 0? Philosophical arguments: Finite observations cannot prove categorical propositions, only disprove them Puts the burden on the researcher Anyone can create an apparent difference between conditions by using very small sample sizes Assume no effect (or standard effect) until given sufficient evidence Practical argument: The null hypothesis is specific and well-defined, making it easy to predict a sampling distribution 13

Rejection Regions α= 0.05; 1-tailed test (test that µ 1 > µ 0 ) α= 0.05; 2-tailed test (test that µ 1 µ 0 ) p( X) X X Why 0.05? 14

Errors in Hypothesis Testing Because the hypothesis test relies on sample data, and sample data are variable, there is always a risk that the hypothesis test will lead to the wrong conclusion. Two types of errors are possible: Type I errors (false positives) Type II errors (false negatives) 15

Errors in Hypothesis Testing 16

Errors in Hypothesis Testing 17

σ0 = σ1 = σ Population Raw scores (x) n = 4 σ σ σ M = = n 2 Sampling β α Sample means (M) 18

Errors in Hypothesis Testing P = α P = 1-β P = 1-α P = β 19

Power The statistical power of a test is simply the probability of correctly rejecting the null hypothesis when it is false For our purposes, you can think of this as the probability that the test will classify an actual difference in population means as significant. 20

σ0 = σ1 = σ Population Raw scores (x) n = 4 σ σ σ = = X n 2 Sampling power = 1 β β α Sample means (X ) 21

Factors that Affect the Power of a Test 1. The probability of a Type I error (α), or the level of significance, and the criterion for rejecting H 0, which are directly related to each other. 2. The true difference between the underlying population means under the alternative hypothesis (μ 1 - μ 0 ). 3. The standard error(s) of the mean(s), which is a function of the sample size n and the population variance σ 2. 4. The particular research design and test used and whether the test is one or two-tailed. 22

Power as a Function of α Population x α = 0.05 β = 0.73 power = 0.27 Sampling β 1 β = power α X 23

Power as a Function of α Population x α = 0.10 β = 0.62 power = 0.38 Sampling β 1 β = power α X 24

Power as a Function of α Population x α = 0.20 β = 0.48 power = 0.52 Sampling β 1 β = power α X 25

Power as a Function of (μ 1 - μ 0 ) µ µ 1 0 = 0.5 Population x µ µ 1 0 = 0.5 β = 0.84 power = 0.16 Sampling β 1 β = power α X 26

Power as a Function of (μ 1 - μ 0 ) µ µ 1 0 = 1.0 Population x µ µ 1 0 = 1.0 β = 0.62 power = 0.38 Sampling β 1 β = power α X 27

Power as a Function of (μ 1 - μ 0 ) µ µ 1 0 = 2.0 Population x µ µ 1 0 = 2.0 β = 0.36 power = 0.84 Sampling β α 1 β = power X 28

Power as a Function of n and σ σ =1.5 Population x Sampling β n = 4 σ X σ = 0.75 = 4 0.81 β = power = 0.19 1 β = power α X 29

Power as a Function of n and σ σ = 0.75 Population x Sampling n = 4 σ X σ = 0.375 = 4 0.15 β = power = 0.85 1 β = power β X α 30

Power as a Function of n and σ σ =1.5 Population x Sampling n = 16 σ σ = = 0.375 X 16 β = 0.15 power = 0.85 1 β = power β X α 31

Some Pros & Cons of Hypothesis Testing Pros Objective method for making decisions regarding data Simple rules, do not require statistics expertise In the absence of auxiliary biases (and in scrupulous hands), guarantees correct decisions in the long run Cons Rigid, 1-bit decision making Absolves scientists from thinking carefully about analysis Long-run guarantees rely on replication and unbiased reporting & publication p-values & significance level not useful for meta-analysis 32