Lecture 06. DSUR CH 05 Exploring Assumptions of parametric statistics Hypothesis Testing Power

Similar documents
Hypothesis Testing. Hypothesis: conjecture, proposition or statement based on published literature, data, or a theory that may or may not be true

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Testing for Normality

Lecture 30. DATA 8 Summer Regression Inference

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

Probability Distributions.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Testing for Normality

Degrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large

Solutions exercises of Chapter 7

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Sampling Distributions

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

An inferential procedure to use sample data to understand a population Procedures

Inferences About the Difference Between Two Means

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Review of Statistics 101

Do students sleep the recommended 8 hours a night on average?

Descriptive Statistics-I. Dr Mahmoud Alhussami

Non-parametric tests, part A:

T-Test QUESTION T-TEST GROUPS = sex(1 2) /MISSING = ANALYSIS /VARIABLES = quiz1 quiz2 quiz3 quiz4 quiz5 final total /CRITERIA = CI(.95).

CSSS/STAT/SOC 321 Case-Based Social Statistics I. Levels of Measurement

Bio 183 Statistics in Research. B. Cleaning up your data: getting rid of problems

Glossary for the Triola Statistics Series

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Workshop Research Methods and Statistical Analysis

Statistics for Managers Using Microsoft Excel/SPSS Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests

4.1. Introduction: Comparing Means

3rd Quartile. 1st Quartile) Minimum

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

CIVL 7012/8012. Collection and Analysis of Information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Advanced Experimental Design

Readings Howitt & Cramer (2014) Overview

Using SPSS for One Way Analysis of Variance

Basics on t-tests Independent Sample t-tests Single-Sample t-tests Summary of t-tests Multiple Tests, Effect Size Proportions. Statistiek I.

Econ 325: Introduction to Empirical Economics

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Chapter 18 Resampling and Nonparametric Approaches To Data

CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

Readings Howitt & Cramer (2014)

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

Chi-Square. Heibatollah Baghi, and Mastee Badii

Business Statistics. Lecture 10: Course Review

Inferential statistics

Lecture 7: Hypothesis Testing and ANOVA

Experimental Design and Data Analysis for Biologists

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Background to Statistics

Psych 230. Psychological Measurement and Statistics

Topic 8. Data Transformations [ST&D section 9.16]

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Genome Sequencing and Structural Variation (2)

Sleep data, two drugs Ch13.xls

Textbook Examples of. SPSS Procedure

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

James H. Steiger. Department of Psychology and Human Development Vanderbilt University. Introduction Factors Influencing Power

Lecture 16: Again on Regression

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Unit 12: Analysis of Single Factor Experiments

Statistics Introductory Correlation

M A N O V A. Multivariate ANOVA. Data

The Model Building Process Part I: Checking Model Assumptions Best Practice

Exam details. Final Review Session. Things to Review

Statistics I Chapter 2: Univariate data analysis

Non-Parametric Two-Sample Analysis: The Mann-Whitney U Test

STAT 200 Chapter 1 Looking at Data - Distributions

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Stat 427/527: Advanced Data Analysis I

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

The problem of base rates

Quantitative Analysis and Empirical Methods

CHI SQUARE ANALYSIS 8/18/2011 HYPOTHESIS TESTS SO FAR PARAMETRIC VS. NON-PARAMETRIC

Psychology 282 Lecture #4 Outline Inferences in SLR

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Advanced Regression Topics: Violation of Assumptions

Statistical Inference. Why Use Statistical Inference. Point Estimates. Point Estimates. Greg C Elvers

HYPOTHESIS TESTING: SINGLE MEAN, NORMAL DISTRIBUTION (Z-TEST)

Preview from Notesale.co.uk Page 3 of 63

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Course Review. Kin 304W Week 14: April 9, 2013

P8130: Biostatistical Methods I

COMPARING SEVERAL MEANS: ANOVA

Introduction to Statistics with GraphPad Prism 7

y response variable x 1, x 2,, x k -- a set of explanatory variables

Statistics I Chapter 2: Univariate data analysis

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

NAG Library Chapter Introduction. G08 Nonparametric Statistics

Dr. Maddah ENMG 617 EM Statistics 10/12/12. Nonparametric Statistics (Chapter 16, Hines)

Transcription:

Lecture 06 DSUR CH 05 Exploring Assumptions of parametric statistics Hypothesis Testing Power

Introduction Assumptions When broken then we are not able to make inference or accurate descriptions about reality. Thus our models are flawed descriptions and inferences will be compromised. Assumptions of parametric tests based on the normal distribution Understand the assumption of normality Understand homogeneity of variance Know how to correct problems (with respect to the assumptions of normality) in the data Slide 2

Assumptions Parametric tests based on the normal distribution assume: Normally distributed Distribution of samples Model distribution (residuals) Homogeneity of variance Interval or ratio level data Some data are intrinsically not normally distributed. Independence of observation

The Normal Distribution Review Commonly the distribution of measurements (frequency of data collected from interval data) have a bell shaped distribution Parameters of the model determine its shape. Zar (Figure 6.2)

The Normal Distribution Review Two parameter distribution Symmetric around mu

Distribution of Samples Central Limit Theorem How well do samples represent the population? A key foundation of frequentist statistics samples are random variables: When we take a sample from the population we are taking one of many possible samples. Thought experiment take many, many samples from a population

Thought experiment Distribution of Samples Central Limit Theorem Do we expect all sample to have the same mean value (the same sample mean)? No, there is variation in the samples Sampling variation. The frequency histogram of samples is the sampling distribution. Analog to standard deviation SD how well does model fit the data We can take the standard deviation of the sample mean Termed Standard Error of the Sampling mean Or Standard Error If the sample is large then sampling error can be approximated:

Assumption #1 Samples and model residuals are normally distributed. We will review how to check these assumptions in this lecture and the following lab.

Assumption #2 Homogeneity of Variance Data taken from groups must have homogenous variance Homogenous does not mean equal but equal in the probabilistic sense. We will review how to check these assumptions in this lecture and the following lab.

Assumption #3 Interval and Ratio Scale Continuous variables Interval scale (equal intervals between measurements). Ratio scale Conversion of interval data such that ratio of measurements was meaningful. Ordinal data rankings Darker, faster, shorter and might label these 1,2,3,4,5 to reflect increases in magnitude. Really convey less information data condensation. Nominal or Categorical data Example are public surveys Willingness to vote for a candidate? Economic class, Taxonomic categories.

Assumption #4 Observations are independent The measurement of one sample does not influence the measurement of another sample. Measurements taken in space and time are examples experimenter needs to determine when there is zero correlation between the samples. Behavioral Example the opinion of one person influences the behavior of another person and hence the measurements are correlated.

Assessing Normality We don t have access to the population distribution so we usually test the observed data Graphical displays Q Q plot (or P P plot) Histogram Kolmogorov Smirnov Shapiro Wilk Slide 12

Assessing Homogeneity of Variance Figures Levene s test Tests if variances in different groups are the same. Significant = variances not equal Non significant = variances are equal Variance ratio With 2 or more groups VR = largest variance/smallest variance If VR < 2, homogeneity can be assumed. Slide 13

Correcting Data Problems Log transformation log(x i ) or log(x i +1) Reduce positive skew. Square root transformation: Also reduces positive skew. Can also be useful for stabilizing variance. Reciprocal transformation (1/ X i ): Dividing 1 by each score also reduces the impact of large scores. This transformation reverses the scores You can avoid this by reversing the scores before the transformation, 1/(X Highest X i ). Slide 14

To Transform Or Not Transforming the data helps as often as it hinders the accuracy of F The central limit theorem: sampling distribution will be normal in samples > 40 anyway. Transforming the data changes the hypothesis being tested E.g. when using a log transformation and comparing means, you change from comparing arithmetic means to comparing geometric means In small samples it is tricky to determine normality one way or another. The consequences for the statistical model of applying the wrong transformation could be worse than the consequences of analysing the untransformed scores. Alternative use non parametric statistics.

Statistical Hypothesis Testing State: H O H A Declare Alpha level Collect Data Compare the test statistic to the critical value (determined by alpha) State the resulting probability

Statistical Hypothesis Testing Ex: Look to see if the population mean is not different from some specified value. H O : u = 0 H A : u is not equal 0 Introduce the idea of a critical value Alpha level of 0.05

Statistical Hypothesis Testing We have data taken from the weight change in horses given some medical treatment. We are interested to know if the mean change in weight that we found +1.29 kg is significantly different from 0 kg. We calculate the z score and find that Z = 1.45 P(mean 1.29) = P(Z 1.45) =? P(mean 1.29) = P(Z 1.45) =?

Statistical Hypothesis Testing P(mean 1.29) = P(Z 1.45) =? P(mean 1.29) = P(Z 1.45) =? Z = 1.96 is the rejection region at 2.5% This is the region of rejection Now we have a way to objectively reject or accept the null hypothesis

Statistical Hypothesis Testing

One and two tailed tests Alternative to testing is the value different In some cases we care about the direction of the difference. Use two tailed test

One and two tailed tests Contrast the region of rejection for these.

Type 1 and type 2 errors Sometimes we: reject the null hypothesis when it is true accept the alternative hypothesis when it is false Type 1 error or alpha error frequency of rejecting H 0 when it is true. Type one error rate is equal to alpha

Type 1 and type 2 errors Type I error: "rejecting the null hypothesis when it is true". Type 1 error or α error is equal to α Now we have some criteria to choose alpha. So if your α, or critical value is 0.10 We have a 10% probability of rejecting the null hypothesis when we should have, in fact, accepted it.

Type 1 and type 2 errors Type II error: "accepting the null hypothesis when it is false". Type 2 error or β error is equal to β

Type 1 and type 2 errors Thought experiment: Ex. Endangered species conservation Ex. Pharmaceutical testing

Type 1 and type 2 errors No error No error

Power Power: the probability that a statistical test will reject a null hypothesis when it is false.

Power Power: the probability that a statistical test will reject a null hypothesis when it is false.

What influences statistical power