Hypothesis Testing. Mean (SDM)

Confidence Intervals and Hypothesis Testing Readings: Howell, Ch. 4, 7 The Sampling Distribution of the Mean (SDM) Derivation - See Thorne & Giesen (T&G), pp. 169-171 or online Chapter Overview for Ch. 9 http://www.mhhe.com/thorne4 See also Howell, Chapter 4.2, 7.1

Getting to Know You Name, address, and phone number? With distributions, it s Central tendency The mean of the sampling distribution of M Variability The variance of the sampling distribution of M The standard deviation of the sampling distribution of M Shape Properties of the Sampling Properties of the Sampling Distribution of the Mean (SDM):

The Sampling Distribution of the Mean: How is it constructed? The sampling distribution of the mean (SDM) is constructed by drawing samples of some fixed size (say N = 10), calculating the mean of the samples, repeating the process for a large number of samples, and plotting the resulting means in a frequency polygon. The resulting distribution is the sampling distribution of the mean. Property 1 The population mean of the SDM equals the mean of the Parent Population. In symbols:

Property 2 The larger the size of the sample (N ) taken when constructing the SDM, the more closely the SDM will approximate a normal curve. If the parent population is normal, the SDM will be exactly normal. Property 3 The larger the size of the sample (N), the smaller the standard deviation of the SDM. The standard deviation of the sampling distribution of the mean is called the standard error of the mean. In symbols:

AKA The Central Limit Theorem Howell says, This is one of the most important theorems in statistics. (7.1) His version, p. 180: Also see this section for his illustration of the shape property. Rice Virtual Statistics i LbSi Lab Simulation i Forming Any z Score z = (score - mean of scores)/st dev of scores z (X) = z (M) = Finding M for a z - Formula 9-3 (T&G); Howell, 7.3 in context of Conf. Intervals.

Estimation and Degrees of Freedom Estimation for μ Estimation of sigma-squared or sigma N-1 in that case was the degrees of freedom General definition for degrees of freedom the number of values that are free to vary after certain restrictions have been placed on the data. Given that 5 scores must sum to zero, let s try it. See Howell, p. 187 for a more specific rationale. Estimating the Standard Error of the Mean Standard Error of the Mean formula: Estimated Standard Error of the Mean formula: Formula for z score: Formula when we estimate the standard error of the mean:

Summary Parameter Name Symbol Equivalence Sample Estimate Name Symbol Equivalence Population Mean of the SDM Sample Mean of the SDM Population Variance of the SDM Sample Variance of the SDM Population Standard Deviation of the SDM* Sample standard error of the mean The t distribution- A little history Ireland, the Guinness brewing company, and William Sealy Gosset t distribution developed to be used with small samples and unknown population variances Student st t -- rather than Gosset s s t-- so that the company wouldn t have to admit to a bad batch. More

t and z -- What s the difference? Sketch Role of the df Formula See Howell, Figure 7.5, p. 187. The Confidence Interval Finding deviant mean scores Sketch: Note The horizontal axis is the X axis\ The deviant scores are now means Formula (where M = X-bar) 95% CI = +/- t(.05/2) s(m) + M 99% CI = +/- t(.01/2) s(m) + M See also, Howell, 7.3, Confidence Interval on µ

Example: Checking Progress Solution Interpretations We can be 95% confident that the average digit span of the adult population p is at least 6.32 and at most 7.90 digits. We can be 99% confident that the average digit span of the adult population is at least 6.05 and at most 8.17 digits. Sketches

What the Confidence Interval Really Means Confidence -- not probability The sample has already been drawn For a 95% confidence interval, if the experiment were repeated 100 times, 95 of the intervals would contain (capture) the true population mean μ. μ The rub is, for any given one, we don t know forsureifitcapturesμ it μ or not, hence the confidence language. Confidence vs. Probablilty We place our conficence in the method rather than the interval. You can say your confidence is.95 that Mu is found in the interval. You re at risk if you say that the probability is.95 that Mu lies in the interval. Many will pounce if you do. See Howell,,p. 193.

Hypothesis Testing: One-Sample t Test Example Handout Ratings of Speech http://www2.msstate.edu/~jmg1/3103/one_s ample_ttest.pdf

Hypothesis Testing: One-Sample t Test How deviant or unlikely is a particular sample mean? Analogous to finding the deviant probability of a score with the z curve Start from knowledge or assumption of the parent population. We know from much previous intelligence research that the population p mean adult digit span is 7.0 (μ = 7). Another Example Research Context Does amphetamine (a stimulant) affect recent memory? 25 volunteers take a small dose then take the digit span test M = 7.53 and s = 0.97 What is the probability of a mean this deviant? Outline: M --> >t --> Area (or probability)

Set up 7-step procedure Solution We assumed our sample came from the adult population with average digit span of 7. Our sample mean was 7.53, and this was 2.73 standard error units away from the mean. S(M) = s/sqrt 0.97 = 0.194 Tcal = (7.53 7.00)/0.194 = 2.73 Tcritcal = t (25,.025) = 2.0639 (look-up) t(calculated) = 2.73, and the probability of a score this or more deviant was less than.05. A sample mean this deviant is very unlikely to have come from a population where μ = 7. Conclusion A one-sample t test was performed to determine if amphetamine affects adult digit span. The digit span for the amphetamine sample (M = 7.53) was significantly different from the normal adult digit span of 7, t(24) = 2.73,,p p <.05 (p <.02).) Amphetamine significantly enhances recent memory.

The Seven-Step Procedure An easy, organized way to test hypotheses and come to a conclusion. See T&G, pp. 190-191. mhhe.com/thorne4 Chapter 9 Review Gunsmoke: Errors in Hypothesis Testing Reality Cheating Decision Not Cheating H 0 : BB Honest H A : BB Cheating Error I: Shot Innocent Man Consequences? Correct Correct Error II: Continue to Play Consequences?

Errors in Hypothesis Testing Reality Decision Reject Fail to Reject (Claim an Effect) (No Claim) H 0 :True H 0 : False Type I Error Alpha Error (false claim) Consequences? Correct Correct Type II Error Beta Error (failure of detection) Consequences? Errors in Hypothesis Testing Signal Detection Theory Version Reality H 0 : True (No Sign.) H 0 : False (Yes Signal present) Is Signal Present? Decision Yes (Reject) (Claim an Sig. Present) Type I, Alpha Error Alpha Error (false positive false alarm Consequences? Correct Hit No Signal Fail to Reject (No Claim) Correct (Correct Rejection) Type II, β Error (False negative Miss ) (failure of detection) Consequences?

Consequences of Errors False Positive (false claim, Type I error) Gunsmoke: If you shoot an innocent man, you ll hang Death penalty trials avoid conviction c o of innocents. Evidence must by very very good. Hunters in the woods must not shoot everything that moves in the bushes. It could be a person or dog. Threshold for pulling the trigger must be raised until the signal is clearly present. In these situations, really want to avoid false claims. False negatives (failures of detection) not that bad Gunsmoke: Continue to play & loose $. Some criminals get lesser sentence or go free The hunter misses their game once in a while. False positives less harmful than False negatives Blood bank screening for AIDS (HIV) virus Test very sensitive; tends to detect HIV sometimes when it is not there (false positives; Type I errors) )but t always detect twhen it is there. Thus we eliminate failures of detection ( Misses ) ). Shoot and ask questions later. Cost of false positives some clean blood samples are wasted; a small price to pay for knowing gyou have no infected blood in YOUR transfusion!

The Almighty d prime Abiasfree bias-free measure of test sensitivity (under construction) A bias for claiming an effect ( shoot anything that moves ) will result in more false positives A bias for holding back on claiming a effect ( be sure before you shoot ) will result in more false negatives (Misses). d is the hit rate H the false alarm rate F when H and F are z-transformed: d = z(h) z(f), where H = P("yes" YES) and F = P("yes" NO) Colin Wilson has provided his Excel formula: d' = NORMINV(hitrate,0,1) - NORMINV(false-alarm-rate,0,1) where Excel's NORMINV "Returns the inverse of the normal cumulative distribution for the specified mean and standard deviation", 0 being the specified mean and 1 being the specified SD. See this link: http://www.linguistics.ucla.edu/faciliti/facilities/statistics/dprime.htm i i l /f ili i/f ili i / i i /d i h http://wise.cgu.edu/sdtmod/signal_applet.asp Power-Definitions Beta = Probability of a Type II error Probability of failure of detection Power = 1 - Beta Power = 1 - Probability of a failure of detection Power = Probability of detection of an effect

Why Power How hard would you work if the chance of getting paid =.20,.50,.80? In behavioral science research we want a good chance of ffinding an effect, if it is present (e.g., a cure for the common cold or a form of cancer, confirming YOUR thesis/dissertation hypothesis). Statistical power tells us our chance of finding an effect the Probability of detection of an effect Factors Affecting Power Alpha level Increasing the alpha level the level of power. Sample Size Increasing the sample size the level of power. Size of the effect (mu0 - mu1) The larger the effect, the the level of power.

Power Diagrams Text illustrations - Howell, 4.7 Note best shown in two curves: one curve for H 0 shows α and 1- α; when H0 is true Another curve for H 1 true shows β and 1- β See Howell s Figure 4.3 Spreadsheet illustration Power.xls What Should We Be Looking For? What statistical significance tells us What effect size tells us Practical significance Statistical ti ti significance ifi = statistical ti ti reliability How sure we are of the presence of some effect Practical significance ifi = effect size = importance! More Later.

Using SPSS One sample t test Confidence Interval Example from Ratings of Speech (enter 4.40 x 33 times, 0.51, 8.29. This simulates data for example.) Enter data; Use this syntax (or point-click) T-TEST /TESTVAL = 4 /VARIABLES = Ratings. * Comment: Set Testval = 0 to get CI on the mean. T-TEST /TESTVAL = 0 /VARIABLES = Ratings. One-Sample Statistics Ratings N Mean Std. Deviation Std. Error Mean 35 4.4000.94395.15956 One-Sample Test Ratings Test Value = 4 95% Confidence Interval of the Difference Mean t df Sig. (2-tailed) Difference Lower Upper 2.507 34.017.40000.0757.7243 * Comment: Set Testval = 0togetCIonthemean. on the T-TEST /TESTVAL = 0 /VARIABLES = Ratings. One-Sample Test Ratings Test Value = 0 95% Confidence Interval of the Difference Mean t df Sig. (2-tailed) (2tild) Difference Lower Upper 27.576 34.000 4.40000 4.0757 4.7243

Lab Assignment 1 Review questions and formulae Formulae for #2. Help on setup for #5. SPSS startup