BIOSTATS 640 Intermediate Biostatistics Frequently Asked Questions Topic 1 FAQ 1 Review of BIOSTATS 540 Introductory Biostatistics

BIOTAT 640 Itermediate Biostatistics Frequetly Asked Questios Topic FAQ Review of BIOTAT 540 Itroductory Biostatistics. I m cofused about the jargo ad otatio, especially populatio versus sample. Could you please clarify? I BIOTAT 540, we were itroduced to the otio of there beig a populatio i the backgroud: a source populatio, about which we would like to lear). We were also itroduced to the idea of a sample draw from that populatio: a collectio of actual, observed ad kow, data values that we will use to draw some ifereces about the populatio. We were remided that, i real life, typically we do ot have the luxury of examiig the etirety of a populatio (that would be a cesus). As regards jargo ad otatio, the covetio is to use greek letters to represet characteristics of the source populatio (ad we referred to these as parameters ad parameter values) ad roma letters to represet characteristics of the sample. Remider: a statistic is just a umber that you calculate from the data i a sample. o here is a little refresher schematic that we might have compiled i BE540 so as to keep track - Parameter i Populatio Estimate from ample Mea µ X Variace etc Here i BIOTAT 640, whe we lear about regressio ad correlatio, a similar compilatio allows us to keep track of what s what. Keep i mid that, typically, the statistic we calculate is calculated as our guess of the parameter i the populatio. Parameter i Populatio lope of lie of Y o X β Itercept of lie of Y o X β 0 Estimate from ample β or b β or b 0 0 Additioal ote to reader: ee the little hat o top? Wheever you see the little hat o top, this is tellig you that what you are lookig at is a estimate. It s ufortuate that the letter is greek but the key is to otice that the little hat meas the quatity is a estimate obtaied from the data ad is therefore a statistic. The little hat also goes by the ame caret. Wheever you see it, thik estimate. FAQTopic_.docx Page of 5

. Remid me agai of the distictio betwee stadard deviatio (D or ) ad stadard error (EM or E) ad how this is related to the distictio betwee populatios versus samplig distributios. D or - tadard deviatio is the EM or E - tadard error is the variace of values of idividuals i ature. variace of values of a statistic. The collectio of all possible idividuals i ature goes by the ame populatio The collectio of all possible values of a statistic (imagie replicatig your study over ad over a gazillio times ad compilig the collectio of all possible sample meas X ) does ot go by the ame populatio, eve though this would make sese. Istead, this collectio of all possible values of whatever statistic you re iterested i goes by the ame samplig distributio. o who cares? Well, actually there are times whe we are very iterested i the samplig distributio of X (eg cliical trials). Ad there are times whe we might be iterested i the samplig distributio of (eg studies of lab performace). By extesio, we ca imagie that there might be times whe we re iterested i some other statistic. I our uit o regressio ad correlatio for example, we will see that we are iterested i the samplig distributio of a estimated slope, β I guess I do t see why I d be iterested i the samplig distributio of β You re iterested i the samplig distributio of β whe you re iterested i what aother ivestigator might obtai as a β if he/she were to repeat your study ad come up with his/her ow estimate. Whether you re aware of this cosciously or ot, this is the kid of thig you are iterested i (geeralizability, robustess are some familiar terms for this) whe you read a joural article ad are watig to kow if you would obtai similar fidigs if you were to repeat the published study i your ow sample of folks. FAQTopic_.docx Page of 5

3. Ick. I do t uderstad summatio otatio. Ufortuately, otatio does get i the way of uderstadig ideas sometimes. The summatio otatio is othig more tha a secretarial coveiece. We use it to avoid havig to write out log expressios. For example, Istead of writig x + x + x + x + x, We write 5 x i i= 3 4 5 Aother example Istead of writig x * x * x * x * x, We write 5 i= x i 3 4 5 This is actually a example of the product otatio Key to the summatio otatio The Greek symbol sigma says add up some items Below the sigma symbol is the startig poit TARTING HERE END Up o top is the edig poit FAQTopic_.docx Page 3 of 5

4. What are Z-scores, what are t-scores ad what is the distictio betwee them? The Z-core is a tool to compute probabilities of itervals of values for X distributed Normal(µ, ). uppose it is of iterest to calculate a probability for a radom variable X that is distributed Normal(µ, ). ometimes (less so as time goes o because iteret resources are gettig better all the time), we re i a pickle because tabulated ormal probabilities are available oly for the Normal Distributio with µ = 0 ad =. We solve our problem by exploitig a equivalece argumet. The techique goes by the ame stadardizatio ad ivolves replacig the desired calculatio with a equivalet oe for a ew radom variable called a z- score. tadardizatio expresses the desired calculatio for X distributed Normal(µ, ) as a equivalet calculatio for Z (Z is ow called a Z-score) where Z is distributed stadard ormal, Normal(0,). a-µ b-µ pr[ a X b ] =pr Z. Thus, Z-score = X µ Note - The techique of stadardizatio of X ivolves ceterig (by subtractio of the mea of X which is µ) followed by rescalig (usig the multiplier /) Watch out whe you are performig stadardizatio that the re-scalig is with the correct variace. Here are 3 examples followed by a geeric, just to be sure that you get the idea: a-µ b-µ pr a X b =pr Z-score. [ ] a-µ b-µ. pr a X b =pr Z-score / / 3. a-e(β ) b-e(β ) pr a β b =pr Z-score E(β ) E(β) a-e(statistic) b-e(statistic) pr a statistic b =pr Z-score E(statistic) E(statistic) 4. [ ] FAQTopic_.docx Page 4 of 5

The z-score method is appropriate uder circumstaces: () whe the startig variable is distributed Normal to begi with, ad () whe the startig variable ca be appreciated as a istace of the cetral limit theorem (ot discussed here). A t-score is a studet s t radom variable. There s lots of ways to have a radom variable that is distributed studet s t. Oe is to coceive of a studet s t radom variable as a t-score ad i this way, aalogous to a z-score. Oe defiitio of a studet s t radom variable: I the settig of a radom sample X...X of idepedet, idetically distributed outcomes of a Normal(µ, ) distributio, where we calculate X ad i the usual way: i= X= X i ad = i= ( ) Xi X - a studet s t distributed radom variable results if we costruct a t-score istead of a z-score. µ is distributed tudet s t with degrees of freedom = (-) t - score = t = X - DF=- s / Note If we wat to stadardize X, the solutio depeds o whether we kow its E or we do t. E(X) is kow E(X) is NOT kow tadardizatio of X X-µ Z-score= E(X) X-µ t-score= E(X) Where E(X)= E(X)= Recall = i= (X -X) i (-) FAQTopic_.docx Page 5 of 5