Difference tests (1): parametric

Similar documents
Comments on Discussion Sheet 18 and Worksheet 18 ( ) An Introduction to Hypothesis Testing

Chapter 9. Key Ideas Hypothesis Test (Two Populations)

COMPARISONS INVOLVING TWO SAMPLE MEANS. Two-tail tests have these types of hypotheses: H A : 1 2

TESTS OF SIGNIFICANCE

x z Increasing the size of the sample increases the power (reduces the probability of a Type II error) when the significance level remains fixed.

M227 Chapter 9 Section 1 Testing Two Parameters: Means, Variances, Proportions

Statistical Inference Procedures

Statistics and Chemical Measurements: Quantifying Uncertainty. Normal or Gaussian Distribution The Bell Curve

IntroEcono. Discrete RV. Continuous RV s

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

Tools Hypothesis Tests

SOLUTION: The 95% confidence interval for the population mean µ is x ± t 0.025; 49

STA 4032 Final Exam Formula Sheet

Stat 3411 Spring 2011 Assignment 6 Answers

VIII. Interval Estimation A. A Few Important Definitions (Including Some Reminders)

Chapter 9: Hypothesis Testing

STUDENT S t-distribution AND CONFIDENCE INTERVALS OF THE MEAN ( )

18.05 Problem Set 9, Spring 2014 Solutions

UNIVERSITY OF CALICUT

20. CONFIDENCE INTERVALS FOR THE MEAN, UNKNOWN VARIANCE

Confidence Intervals: Three Views Class 23, Jeremy Orloff and Jonathan Bloom

S T A T R a c h e l L. W e b b, P o r t l a n d S t a t e U n i v e r s i t y P a g e 1. = Population Variance

CE3502 Environmental Monitoring, Measurements, and Data Analysis (EMMA) Spring 2008 Final Review

Chapter 8.2. Interval Estimation

100(1 α)% confidence interval: ( x z ( sample size needed to construct a 100(1 α)% confidence interval with a margin of error of w:

Chapter 8 Part 2. Unpaired t-test With Equal Variances With Unequal Variances

CHAPTER 6. Confidence Intervals. 6.1 (a) y = 1269; s = 145; n = 8. The standard error of the mean is = s n = = 51.3 ng/gm.

Statistical Equations

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Confidence Intervals. Confidence Intervals

Isolated Word Recogniser

Estimation Theory. goavendaño. Estimation Theory

11/19/ Chapter 10 Overview. Chapter 10: Two-Sample Inference. + The Big Picture : Inference for Mean Difference Dependent Samples

Statistical treatment of test results

ON THE SCALE PARAMETER OF EXPONENTIAL DISTRIBUTION

McNemar s Test and Introduction to ANOVA

Reasons for Sampling. Forest Sampling. Scales of Measurement. Scales of Measurement. Sampling Error. Sampling - General Approach

Statistical Intervals Based on a Single Sample (Devore Chapter Seven)

TI-83/84 Calculator Instructions for Math Elementary Statistics

Elementary Statistics

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

MATHEMATICS LW Quantitative Methods II Martin Huard Friday April 26, 2013 TEST # 4 SOLUTIONS

13.4 Scalar Kalman Filter

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Statistics Problem Set - modified July 25, _. d Q w. i n

Widely used? average out effect Discrete Prior. Examplep. More than one observation. using MVUE (sample mean) yy 1 = 3.2, y 2 =2.2, y 3 =3.6, y 4 =4.

MTH 212 Formulas page 1 out of 7. Sample variance: s = Sample standard deviation: s = s

u t u 0 ( 7) Intuitively, the maximum principles can be explained by the following observation. Recall

LECTURE 13 SIMULTANEOUS EQUATIONS

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Comparing your lab results with the others by one-way ANOVA

Comparing Means: t-tests for Two Independent Samples

Formula Sheet. December 8, 2011

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Société de Calcul Mathématique, S. A. Algorithmes et Optimisation

10-716: Advanced Machine Learning Spring Lecture 13: March 5

On the Multivariate Analysis of the level of Use of Modern Methods of Family Planning between Northern and Southern Nigeria

STRONG DEVIATION THEOREMS FOR THE SEQUENCE OF CONTINUOUS RANDOM VARIABLES AND THE APPROACH OF LAPLACE TRANSFORM

Questions about the Assignment. Describing Data: Distributions and Relationships. Measures of Spread Standard Deviation. One Quantitative Variable

Statistics - Lying without sinning? Statistics - Lying without sinning?

Statistical Inference for Two Samples. Applied Statistics and Probability for Engineers. Chapter 10 Statistical Inference for Two Samples

Statistics Parameters

Fig. 1: Streamline coordinates

Chem Exam 1-9/14/16. Frequency. Grade Average = 72, Median = 72, s = 20

Chapter 23: Inferences About Means

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Common Large/Small Sample Tests 1/55

Heat Equation: Maximum Principles

Hidden Markov Model Parameters

m = Statistical Inference Estimators Sampling Distribution of Mean (Parameters) Sampling Distribution s = Sampling Distribution & Confidence Interval

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Grant MacEwan University STAT 151 Formula Sheet Final Exam Dr. Karen Buro

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Social Studies 201 Notes for March 18, 2005

Mathacle PSet Stats, Confidence Intervals and Estimation Level Number Name: Date: Unbiased Estimators So we don t have favorite.

State space systems analysis

Lecture 30: Frequency Response of Second-Order Systems

This chapter focuses on two experimental designs that are crucial to comparative studies: (1) independent samples and (2) matched pair samples.

EECE 301 Signals & Systems Prof. Mark Fowler

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

Understanding Samples

Generalized Likelihood Functions and Random Measures

To make comparisons for two populations, consider whether the samples are independent or dependent.

Social Studies 201 Notes for November 14, 2003

Chapter 6 Sampling Distributions

Another Look at Estimation for MA(1) Processes With a Unit Root

GG313 GEOLOGICAL DATA ANALYSIS

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

This is an introductory course in Analysis of Variance and Design of Experiments.

ELEC 372 LECTURE NOTES, WEEK 4 Dr. Amir G. Aghdam Concordia University

Stat 421-SP2012 Interval Estimation Section

A Tail Bound For Sums Of Independent Random Variables And Application To The Pareto Distribution

ME 410 MECHANICAL ENGINEERING SYSTEMS LABORATORY REGRESSION ANALYSIS

Chapter 1 ASPECTS OF MUTIVARIATE ANALYSIS

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 8: Estimating with Confidence

Transcription:

NST B Eperimetal Pychology Statitic practical Differece tet (): parametric Rudolf Cardial & Mike Aitke / 3 December 003; Departmet of Eperimetal Pychology Uiverity of Cambridge Hadout: Awer to Eample (from lat time) Hadout 3 (diff. tet ) Eample 3 (diff. tet ) Homophoe practical data pobo.com/~rudolf/pychology

Remider: baic priciple ad Z

Remider: the logic of ull hypothei tetig Reearch hypothei (H ): e.g. meaure weight of 50 jogger ad 50 ojogger; reearch hypothei might be there i a differece betwee the weight of jogger ad o-jogger; the populatio mea of jogger i ot the ame a the populatio mea of o-jogger. Uually very hard to calculate the probability of a reearch hypothei (ometime becaue they re poorly pecified for eample, how big a differece?). Null hypothei (H 0 ): e.g. there i o differece betwee the populatio mea of jogger ad o-jogger; ay oberved differece are due to chace. Calculate probability of fidig the oberved data (e.g. differece) if the ull hypothei i true. Thi i the p value. If p very mall, reject ull hypothei ( chace aloe i ot a good eough eplaatio ). Otherwie, retai ull hypothei (Occam razor: chace i the implet eplaatio). Criterio level of p i called α.

Remider: α, ad error we ca make True tate of the world Deciio H 0 true H 0 fale Reject H 0 Type I error probability α Correct deciio probability β power Do ot reject H 0 Correct deciio Type II error probability α probability β

Remider: the ditributio of Z (the tadard ormal ditributio) z µ σ

Remider: If we kow µ ad σ, we ca tet hypothee about igle obervatio with a Z tet Eample: we kow IQ are ditributed with a mea (µ) of 00 ad a tadard deviatio (σ) of 5 i the healthy populatio. If we elect a igle pero from our populatio, what i the probability that he/he ha a IQ of 60 or le? z µ σ 60 00 5.667 Our table will tell u that the probability of fidig a Z core le tha +.667 i 0.996. So the probability of fidig a Z core le tha.667 i 0.996 0.0038 (ice the Z curve i ymmetrical about zero).

Remider: oe- ad two-tailed tet I our eample, we aked for the probability of fidig a idividual with a IQ of 60 or le i the ormal populatio. Thi i equivalet to a hypothei tet where the ull hypothei i the idividual come from the ormal populatio with mea 00 ad SD 5. We calculate p, ad would reject our hypothei if p < α (where α i typically 0.05 by arbitrary covetio). Thi would be a oe-tailed tet. It would fail to detect deviatio from the mea the other way (IQ > mea). To detect both, we ue a two-tailed tet, ad allocate α/ for tetig each tail to keep the overall Type I error rate at α.

The t tet

The amplig ditributio of the mea Ditributio of the populatio (probability deity fuctio of roll of a igle die) µ 3.50 σ.7 σ µ µ 3.50 40 σ.7 0.7 40 The Cetral Limit Theorem Give a populatio with mea µ ad variace σ, from which we take ample of ize, the ditributio σ σ of ample mea will have a mea µ µ, a variace σ, ad a tadard deviatio σ. A the ample ize icreae, the ditributio of the ample mea will approach the ormal ditributio. Thi let u tet hypothee about group of obervatio (ample). For a give, we ca fid out the probability of obtaiig a particular ample mea.

If we kow the populatio SD, σ, we ca tet hypothee about ample with a Z tet Eample: we kow IQ are ditributed with a mea (µ) of 00 ad a tadard deviatio (σ) of 5 i the healthy populatio. Suppoe we take a igle ample of 5 people ad fid their IQ are {40,, 95, 05, 9}. What i the probability of obtaiig data with thi ample mea or greater from the healthy populatio? Well, we ca work out our ample mea: We kow : We kow the mea of all ample mea from thi populatio: µ µ ad the tadard deviatio of all ample mea: (ofte called the tadard error of the mea) σ σ 5 5 0.4 5 00 6.708 So we ca work out a Z core: z µ σ 0.4 00 6.708.55 Our table will tell u that P(Z <.55) 0.9394. So P(Z >.55) 0.9394 0.06. We d report p 0.06 for our tet.

But ormally, we do t. So we have to ue a t tet. z µ µ σ σ If we do t kow the populatio SD, σ, ad very ofte we do t, we ca t ue thi tet.? t µ µ Itead, we ca calculate a umber uig the ample SD (which we ca eaily calculate) a a etimator of the populatio SD (which we do t kow). But thi umber, which we call t, doe NOT have the ame ditributio a Z.

The ditributio of t: Studet (Goett ) t ditributio A i o ofte the cae, beer made a tatitical problem go away. Studet (Goett, W.S.) (908). The probable error of a mea. Biometrika 6: 5.

The ditributio of t whe the ull hypothei i true deped o the ample ize ( d.f.) Whe d.f., the t ditributio (uder H 0 ) i the ame a the ormal ditributio.

Degree of freedom (df). (Few udertad thi well!) Etimate of parameter ca be baed upo differet amout of iformatio. The umber of idepedet piece of iformatio that go ito the etimate of a parameter i called the degree of freedom (d.f. or df). Or, the umber of obervatio free to vary. (Eample: 3 umber ad a mea.) Or, the df i the umber of meauremet eceedig the amout abolutely eceary to meaure the object (or parameter) i quetio. To meaure the legth of a rod require meauremet. If 0 meauremet are take, the the et of 0 meauremet ha 9 df. I geeral, the df of a etimate i the umber of idepedet core that go ito the etimate miu the umber of parameter etimated from thoe core a itermediate tep. For eample, if the variace σ i etimated (by ) from a radom ample of idepedet core, the the umber of degree of freedom i equal to the umber of idepedet core () miu the umber of parameter etimated a itermediate tep (oe, a µ i etimated by ) ad i therefore. ( ) Two tatitic are drikig i a bar. Oe tur to the other ad ak So how are you fidig married life? The other replie It okay, but you loe a degree of freedom. The firt chuckle evilly. You eed a larger ample. X

Critical value of t (for a give umber of d.f.) Whe d.f., the t ditributio (uder H 0 ) i the ame a the ormal ditributio.

The oe-ample t tet We ve jut ee the logic behid thi. We calculate t accordig to thi formula: df for thi tet t µ µ X ample SD ample mea tadard error of the mea (SEM) (tadard deviatio of the ditributio of ample mea) tet value Degree of freedom: we have obervatio ad have calculated oe itermediate parameter (, which etimate µ i the calculatio of X ), o t ha df. The ull hypothei i that the ample come from a populatio with mea µ. Look up the critical value of t (for a give α) uig your table of t for the correct umber of degree of freedom ( ). If your t i bigger, it igificat.

The oe-ample t tet: EXAMPLE () It ha bee uggeted that 5-year-old hould leep 8 hour per ight. We meaure leep duratio i 8 uch teeager ad fid that they leep {8.3, 5.4, 7., 8., 7.6, 6., 9., 7.3} hour per ight. Doe their group mea differ from 8 hour per ight? t µ X

The oe-ample t tet: EXAMPLE () It ha bee uggeted that 5-year-old hould leep 8 hour per ight. We meaure leep duratio i 8 uch teeager ad fid that they leep {8.3, 5.4, 7., 8., 7.6, 6., 9., 7.3} hour per ight. Doe their group mea differ from 8 hour per ight? ample mea ample SD ( X ) populatio mea to tet (µ) ample ize () df critical value of t (ue α 0.05 two-tailed) 7.4 Sice our t i ot a large a the critical value, we do ot reject the ull hypothei. Not igificat ; p > 0.05. We have ot etablihed that, a a group, they leep le tha 8h per ight. t X.78 µ 8 t df 8 µ 7.4 8 X.78 8 7.44 for 7 df, ad α 0.05 two tailed, critical t.365

Paired ad upaired tet (related ad urelated data) Now we ll look at t tet with two ample. I geeral, two ample ca be related or urelated. Related: e.g. meaurig the ame ubject twice; meaurig a large et of twi; ay ituatio i which two meauremet are more likely to reemble each other tha by chace aloe withi the domai of iteret. Urelated: where o two meauremet are related. Eample: meaurig digit pa o lad ad uderwater. Could ue either related (withi-ubject) deig: meaure te people o lad; meaure ame te people uderwater. Good performer o lad likely to be good performer uderwater; the two core from the ame ubject are related. urelated (betwee-ubject) deig: meaure te people o lad ad aother te people uderwater. If there i relatede i your data, your aalyi mut take accout of it. Thi may give you more power (e.g. if the data i paired, a paired tet ha more power tha a upaired tet; upaired tet may give Type II error). beware peudoreplicatio: e.g. meaure oe pero te time o lad; meaure aother pero te time uderwater; preted that 0. I fact,, a repeated meauremet of the ame pero do ot add much more iformatio they re all likely to be imilar. Get Type I error.

The two-ample, paired t tet Very imple. Calculate the differece betwee each pair of obervatio. The perform a oe-ample t tet o the differece, comparig them to zero. (Null hypothei: the mea differece i zero.) t µ X tet value for the differece (zero for the ull hypothei there i o differece )

The two-ample, paired t tet: EXAMPLE () Lookig at high-frequecy word oly, doe the rate of error that you made while categorizig homophoe differ from the error rate whe categorizig o-homophoe (cotrol) word i.e. i there a o-zero homophoe effect? (Each ubject categorize both homophoe ad cotrol word, o we will ue a paired t tet.) Relevat differece core are labelled % error homophoe effect high f o your ummary heet. t µ X

The two-ample, paired t tet: EXAMPLE () Lookig at high-frequecy word oly, doe the rate of error that you made while categorizig homophoe differ from the error rate whe categorizig o-homophoe (cotrol) word i.e. i there a o-zero homophoe effect? mea of differece ample SD ( X ) of differece mea diff. uder ull hypothei (µ) ample ize () differece } df critical value of t t X { 8.3,0.0,8.3, 6.7, 3. X 9.64 µ 0 t df Sice our t i larger tha the critical value, we reject the ull hypothei. Sigificat ; p < 0.05. I fact, p < 0.0, ice critical t for α 0.0 ad 96 df i appro..576. You made more error for homophoe (p < 0.0 two-tailed). 97 µ 3. 0 X 9.64 97 96 3.96 for 96 df, ad α 0.05 two tailed, critical t.96

The two-ample, upaired t tet (a) equal ample variace How ca we tet the differece betwee two idepedet ample? I other word, do both ample come from uderlyig populatio with the ame mea? ( Null hypothei.) Baically, if the ample mea are very far apart, a meaured by omethig that deped (omehow) o the variability of the ample, the we will reject the ull hypothei. A alway, t tadard error of omethig the omethig ( SD of a ifiite et of ample of the omethig) I thi cae, t differece betwee the mea tadard error of the differece betwee the mea (SED)

The two-ample, upaired t tet (a) equal ample variace ) ( ) ( t p p p + + + + t + + Do t worry about how we calculate the SED (it i the hadout, ectio 3., if you re bizarrely kee). The precie format of the t tet deped o whether the two ample have the ame variace. If the two ample have the ame variace: If the ample are the ame ize ( ), the formula become a bit impler: If the ample are ot the ame ize ( ), we firt calculate omethig called the pooled variace ( p ) ad ue that to get t: We have + obervatio ad etimated parameter (the mea, ued to calculate the two ), o we have + df.

The two-ample, upaired t tet EXAMPLE Silly eample I low-frequecy word categorizatio where thoe word are homophoe (the hardet coditio, judged by mea error rate), were there differece betwee male ad female? % error Female: 6; mea.7; SD 3.4 Male: 7; mea 5.7; SD 5. Ue the equal-variace upaired t tet (uequal formula). ) ( ) ( t p p p + + + +

The two-ample, upaired t tet EXAMPLE () Silly eample I low-frequecy word categorizatio where thoe word are homophoe (the hardet coditio, judged by mea error rate), were there differece betwee male ad female? Call female group ad male group. Uequal, o... p t ( ) + ( ) + p + df p.7 95. 6 + 60 3.4 5.7 95. 7 + 6+ + 6 5. 86 3.0 0.44 7 86 for 86 df, ad α 0.05 two tailed, critical t Not a igificat differece. 95. 0.93.96 Caveat: ome people were igored becaue there wa t eough of your ame to judge your e by it, or becaue I wa icapable of predictig your e from your ame. So thee data may ot be wholly accurate!

The two-ample, upaired t tet (b) uequal ample variace If the two ample do ot have the ame variace, the umber we calculate would ot have the ame ditributio a t. What we ca do i ue our previou (impler) formula but call the reult t : t + We the tet our t a if it were a t core, but with a differet umber of degree of freedom. A computer ca give a tet with maimum power (the Welch Satterthwaite approimatio), but by had we do omethig impler: df ( ) or ( ), whichever i maller.

Are the variace equal or ot? The F tet So how ca we tell if the variace are the ame or differet for t tetig? (a) We ca look at them. It may be obviou. (b) We ca perform a tatitical tet to compare the two variace. A popular tet ot the bet oe, but a reaoable ad eay oe i the F tet. F i the ratio of two variace. Sice our table will give u critical value for F > (but ot F < ), we make ure F by puttig the bigger variace o top: F, if > F, if > Null hypothei i that the variace are the ame (F ). If our F eceed the critical F for the relevat umber of df (ote that there are eparate df for the umerator ad the deomiator), we reject the ull hypothei. Sice we have eured that F, we ru a oe-tailed tet o F o double the tated oetailed α to get the two-tailed α for the quetio are the variace differet?.

Aumptio of the t tet The mea i meaigful. If you compare the football hirt umber wor by Eglad triker who ve cored more tha 0 goal for their coutry with thoe wor by le ucceful triker, you might fid that the ucceful triker have a mea hirt umber that. lower tha the le ucceful triker. So what? The uderlyig core (for oe-ample ad upaired t tet) or differece core (for paired t tet) are ormally ditributed. Rule of thumb: if > 30, you re fie to aume thi. If > 5 ad the data do t look too weird, it probably OK. Otherwie, bear thi i mid. To ue the equal-variace verio of the upaired two-ample t tet, the two ample mut come from populatio with equal variace (whether ot ). (There a helpful clue to remember that oe i the ame of the tet.) The t tet i fairly robut to violatio of thi aumptio (give a good etimate of the p value) if, but ot if.

Parametric ad o-parametric tet The t tet i a parametric tet: it make aumptio about parameter of the uderlyig populatio (uch a the ditributio e.g. aumig that the data are ormally ditributed). If thee aumptio are violated: (a) we ca traform the data to fit the aumptio better (NOT covered at Part B level) or (b) we ca ue a oparametric ( ditributio-free ) tet that doe t make the ame aumptio. I geeral, if the aumptio of parametric tet are met, they are the mot powerful. If ot, we may eed to ue oparametric tet. They may, for eample, awer quetio about media rather tha mea. We ll cover ome et time.

Fial thought ad techique

Drawig ad iterpretig betwee- ad withi-ubject effect If group have ame ad SEM, t SEM So if SEM bar overlap, mea differ by < SEM, o t <.4 ever igificat. t differece/sed, o SED i alway a appropriate ide of compario. If differece > SED, t > uually igificat for reaoably large.

Remider: multiple compario are potetially evil Number of tet with α 0.05 per tet 3 4 5 00 P(at leat oe Type I error if ull hypothei true) P(o Type I error if ull hypothei true) ( 0.05) 0.05 ( 0.05) 0.0975 ( 0.05) 3 0.46 ( 0.05) 4 0.855 ( 0.05) 5 0.6 ( 0.05) 00 0.994 ( 0.05) (But remember, you ca t make a Type I error ayig omethig i igificat whe it i t at all ule the ull hypothei i actually true. So thee are all maimum Type I error rate.)

Cofidece iterval uig t If we kow the mea ad SD of a ample, we could perform a t tet to ee if it differed from a give umber. We could repeat that for every poible umber... µ t X X µ ± tcritical for df Eample: we meaure the height of 0 UK me. Sample mea.8 m, 0.08 m. For 0 (df 9), t critical for α 0.05 two-tailed i ±.6. Therefore ±.6.8 µ 0.08 0 µ.8 ± 0.06 Thi mea that there i a 95% chace that the true mea height of UK me i betwee.76 ad.88 m.

Power: the probability of FINDING a GENUINE effect Graph how ditributio of ample mea if H 0 i true (left-had curve i each cae) ad if H i true (right-had curve). Settig α determie a cut-off at which we reject H 0, ad i oe of the thig that determie power.

Sigificace i ot the ame a effect ize