Hypothesis testing. Data to decisions

Similar documents
Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

CHAPTER 9: HYPOTHESIS TESTING

Statistics for IT Managers

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

Harvard University. Rigorous Research in Engineering Education

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

HYPOTHESIS TESTING. Hypothesis Testing

Example. χ 2 = Continued on the next page. All cells

Chi Square Analysis M&M Statistics. Name Period Date

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Statistics for Managers Using Microsoft Excel

Ch 7: Dummy (binary, indicator) variables

ECON Introductory Econometrics. Lecture 2: Review of Statistics

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 10: STATISTICAL INFERENCE FOR TWO SAMPLES. Part 1: Hypothesis tests on a µ 1 µ 2 for independent groups

Section 9.1 (Part 2) (pp ) Type I and Type II Errors

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Lectures 5 & 6: Hypothesis Testing

Introductory Econometrics

Chapter 3 Multiple Regression Complete Example

STAT Chapter 8: Hypothesis Tests

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Difference between means - t-test /25

hypothesis a claim about the value of some parameter (like p)

Lecture 6: Linear Regression

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

Quantitative Analysis and Empirical Methods

5.2 Tests of Significance

Chapter 26: Comparing Counts (Chi Square)

The problem of base rates

Lecture 5: ANOVA and Correlation

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Binary Logistic Regression

Chapter 20 Comparing Groups

ME3620. Theory of Engineering Experimentation. Spring Chapter IV. Decision Making for a Single Sample. Chapter IV

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Psych 230. Psychological Measurement and Statistics

First we look at some terms to be used in this section.

Midterm 2 - Solutions

E509A: Principle of Biostatistics. GY Zou

Statistical Inference. Section 9.1 Significance Tests: The Basics. Significance Test. The Reasoning of Significance Tests.

UCLA STAT 251. Statistical Methods for the Life and Health Sciences. Hypothesis Testing. Instructor: Ivo Dinov,

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Performance Evaluation and Comparison

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

determine whether or not this relationship is.

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

Lab #12: Exam 3 Review Key

The Difference in Proportions Test

Ordinary Least Squares Regression Explained: Vartanian

Final Exam - Solutions

CH.9 Tests of Hypotheses for a Single Sample

15: CHI SQUARED TESTS

Power and sample size calculations

Class 19. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 9.1-1

This is a multiple choice and short answer practice exam. It does not count towards your grade. You may use the tables in your book.

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Inferences About Two Population Proportions

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Hypothesis Testing. We normally talk about two types of hypothesis: the null hypothesis and the research or alternative hypothesis.

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Conditional Probability Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Unit 22 deals with statistical inference, which uses the concepts of probability to explain

Applied Statistics for the Behavioral Sciences

Two sided, two sample t-tests. a) IQ = 100 b) Average height for men = c) Average number of white blood cells per cubic millimeter is 7,000.

Power Analysis. Introduction to Power

Beyond p values and significance. "Accepting the null hypothesis" Power Utility of a result. Cohen Empirical Methods CS650

Chapter 10: Chi-Square and F Distributions

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Lecture 28 Chi-Square Analysis

16.400/453J Human Factors Engineering. Design of Experiments II

Sociology Exam 2 Answer Key March 30, 2012

Power and sample size calculations

Probability deals with modeling of random phenomena (phenomena or experiments whose outcomes may vary)

Correlation and Regression (Excel 2007)

An inferential procedure to use sample data to understand a population Procedures

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

CONTINUOUS RANDOM VARIABLES

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Review of Statistics 101

Contingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878

An overview of applied econometrics

Statistics and Quantitative Analysis U4320

POLI 443 Applied Political Research

Statistical Inference with Regression Analysis

Lecture 7: Hypothesis Testing and ANOVA

Chapter 5: HYPOTHESIS TESTING

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Uni- and Bivariate Power

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

Transcription:

Hypothesis testing Data to decisions

The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the value of the statistic is unlikely, reject the null

Road map This chapter illustrates the general procedure by discussing some of the most common tests people perform: 1. Simple mean tests 2. Mean comparison tests 3. Frequency (or proportion) tests 4. Goodness of fit tests 5. Independence tests The next chapter applies the same procedure to the regression context

Simple mean test You believe that your customer base has mean income $40,000 A recent, representative survey of 1,000 customers showed their mean income to be $37,000, with a standard deviation of $2,000 Is it time to revise your beliefs?

Mean test design H 0 : μ = $40,000 If we also knew σ (the population standard deviation) we would know that sample mean μ is roughly normally distributed with mean $40,000 and standard deviation σ n But we don t

The unknown sigma problem σ = 2,000 is an estimate of σ It too is normally distributed by the CLT Test statistic: T = μ μ σ n Under the null, this has a t-distribution with n 1 degrees of freedom Standard normal if n large Reject, basically, if T > 1.96 or T < 1.96 Or look up t-tables for more precision.

Confidence intervals σ n is called the standard error of the mean For n large enough and before we draw our data, with 95% confidence we can say that the population mean μ should be in: μ 1.96 σ n, μ + 1.96 σ n This is called a confidence interval for the mean If μ is outside this interval, reject the null with 95% confidence

General structure of t-tests When an estimate follows a normal distribution, then Estimate null value standard error of estimate follows a t-distribution Only degrees of freedom need to be established and that is test-specific But, any time you have a large, representative sample, the distribution is approximately the standard normal so you are good to go

Remark In our current example, the odds that, literally, μ = $40,000, are, literally, zero Failing to reject that hypothesis means, simply, that that guess cannot be dismissed in favor of distant alternatives For instance, if you claim that COV X, Y is positive and large, then you should be able to reject the hypothesis that COV X, Y = 0 That s putting your theory to a test it should pass with flying colors

Critical values We can design a test by choosing a significance level (or size or alpha) Say we set α = 5% Then we can picks a critical value T such that P T T 5% if the null hypothesis is correct Reject if T > T For normally distributed statistics: T = μ + 1.655σ Or we could pick two values T and T such that P T T or T T 5% Reject if T > T or T < T For normally distributed statistics, e.g.: T = μ + 1.96σ, T = μ 1.96σ

Critical values vs p-values p values look at the outcome of the test and then calculate its probability in some sense or other For instance, in the context of one-sided tests, a particular sample gives you a statistic value of T The p-value is P T T Ideally, you should design a test fully ex-ante (choose its size, in particular) and then let the data speak

Type 1 errors There is a risk that we may reject a null hypothesis when it is, in fact, correct When we use a 5% level to compute critical values, we create a test that has a 5% chance of producing a type 1 error This is often termed a false positive since rejecting H0 is often viewed as finding an effect.

Type 2 errors There is a risk that we may fail to reject a null hypothesis when it is, in fact, incorrect The problem with this language is that since null hypotheses are usually quite specific, incorrect can mean a whole lot of different things It also means that H 0 taken literally, is often false (see remark slide) So how do people measure the risk of type 2 errors in practice? Answer: in a massively ad-hoc way For instance, in the context of one-sided tests for means with critical value T, the risk of type 2 error is typically computed as the risk of getting a rejection when the truth is at T

Mean comparison A university wants to know if it has a gender wage-gap problem It obtains a sample of male and female employees with similar education, age and occupation n 1 females, n 2 males Mean income among women $97,000, stdev is $1,000 Mean income among men $100,000, stdev is $1,500 H 0 : μ 1 = μ 2 Can it be rejected?

Test statistic T = μ 2 μ 1 σ 2 1 n1 +σ 2 n2 is t-distributed The expression for degrees of freedom looks nasty: σ 1 2 n 1 2 σ 1 2 n 1 + σ 2 n 1 1 + 2 n 2 σ 2 2 n 2 2 2 n 2 1 But that s why we have computers (Excel: ttest) What s more, if σ 1 σ 2, then degrees of freedom are roughly n 1 + n 2 2 And, if n 1 and n 2 are large, you can assume a standard normal distribution so, basically, reject T > 1.96 or T < 1.96

Frequency tests (the easiest of them all) H 0 : Machine true defect rate is π = 1% The DGP/Population s standard deviation is π(1 π) So, for a sample of size n, the standard deviation (or standard error) of the mean is π(1 π) n

Goodness of fit tests H 0 : Data came from process X Example: Are the 3,000 draws in data2d2d.xlsx from X? If they were, I d expect to see around 300 draws of 90, instead I see just 69 or them How far are the draws I got from what I d expect under H0? If they are farther than what sample uncertainty alone can reasonably explain, reject

Chisquare distance Data (O for observed, E for expected ): Value Sample outcome (O) Expected ( E) (O-E)^2/E 90 69 300 177.87 100 1,969 1,500 146.64 110 962 1,200 47.20 Total 3,000 3,000 371.71

Chisquare goodness of fit test If H 0 were correct, we d expect the Chisquare distance to be small Specifically, if H 0 is correct, that distance roughly follows a Chisquare distribution Degrees of freedom = (number of values -1) = 2 in this example A look at a Chisquare table says that 95% of the time a draw from such a distribution should be below 5.991 We got a 371.71 Reject H 0 with high confidence

Chisquare independence tests H 0 : Y is independent from X Ex: Is spending independent of gender in a particular population? If you have detailed data on spending from a good sample, you could run a regression But what if I only have coarse/categorical data?

Example Assume we get the following data from a representative sample: Spending Male (O) Female (O) All Frequency <50K 700 601 1,301 0.34 50-99 513 557 1,070 0.28 100-199 410 518 928 0.24 200 or more 227 309 536 0.14 Total 1,850 1,985 3,835 1

Expected outcome If Spending were independent of Gender, the frequency of observations in each spending group should be similar for both groups hence the same as in the overall sample In other words, I d expect something close to: Spending Male (E) Female (E) <50K 627.60 673.40 50-99 516.17 553.83 100-199 447.67 480.33 200 or more 258.57 277.43 Total 1,850 1,985

Chisquare distance The chisquare distance between observed and expected is: Spending Male (O-E)^2/E Female (O-E)^2/E <50K 8.35 7.78 50-99 0.02 0.02 100-199 3.17 2.95 200 or more 3.85 3.59 Total 15.39 14.34 For a total distance of around 29.74

Chisquare independence tests H 0 : Spending is independent of gender If H 0 is correct, the chisquare distance roughly follows a chisquare distribution Degrees of freedom = (number of Spending categories -1) times (number of Gender categories -1) = 3 A look at a Chisquare table shows that 95% of the time a draw from a Chisquare distribution with 3 degrees of freedom should be below 7.815 We got 29.74, reject H 0 with confidence

The bootstrap Classical hypothesis testing relies on a lot of theory There is an alternative procedure that works for any statistic, however complicated, and requires few if any large sample assumptions: bootstrapping Idea: 1. use available data to generate different samples hence a distribution of the statistic 2. use that distribution to produce standard errors and/or produce confidence intervals One assumption: the sample is representative of the population An example will help illustrate the power of bootstrapping More generally, modern stats is putting ever more emphasis on methods which, like bootstrapping, require little to no theory