October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1

Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2

Ifereces from Two Samples Most of the time, what we wat to kow is about differeces. Does a treatmet prolog the life of patiets with a certai kid of cacer? By how much? Does a growth medium with more calcium make stroger egieered cartilage? By how much Does a maufacturig method for a medical device result i fewer defective devices? By how much? We wat to test for these questios, expectig ay deviatio to be i the idicated directio, but kowig that it may work the other way. Thus, here too we always use two-sided tests ad itervals. The ull hypothesis almost always is the hypothesis that the differece is zero, aka ull. We use cofidece itervals to aswer the how much questio. We wat to make cofidece itervals for differeces of meas etc. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 3

What are we assumig? vs. What ca we do about it? The statistical procedures we use i practice the oes we lear about i this class are derived usig a series of mathematical assumptios. Noe of them is exactly true. Ever. We eed to kow which properties are most crucial, ad how far off they ca be ad the procedure still be valid. We also eed to kow what to do about it if it looks like there is a problem. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 4

Cofidece Itervals ad Tests for a Differece of Meas We have a series of measuremets µ σ 2 i ~ N(, ) 2 i ~ N( µ, σ),,, ad,,, 1 2 1 2 Each value of has the same mea ad the same variace (Idetically distributed). Differet values of are statistically idepedet. (Idepedece) Each is ormally distributed. (Normality) Each value of has the same mea ad the same variace (Idetically distributed). Differet values of are statistically idepedet. (Idepedece) Each is ormally distributed. (Normality) Values of ad are statistically idepedet. (Idepedece) October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 5

Cofidece Itervals ad Tests for a Differece of Meas We have a series of measuremets 2 i ~ N(, ) E ( ) = µ V ( µ σ σ 2 ) = /,,, ad,,, 1 2 1 2 2 i ~ N(, ) E ( ) = µ V ( µ σ σ 2 ) = / E( ) = µ µ V ( σ σ 2 2 ) = / + / October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 6

Cofidece Itervals ad Tests for a Differece of Meas E( ) = µ µ V ( ) = σ / + σ / 2 2 σ / + σ / 2 2 = Z s / + s / 2 2 = t ν (We will show the degrees of freedom later. Ofte ear 1+ 1) October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 7

Cofidece Itervals ad Tests for a Differece of Meas σ s / + σ / 2 2 / + s / 2 2 = Z = t ν A test that the differece i meas is zero ca be based o this statistic. If is large, we ca simply use the ormal distributio. If ot, we use ( s / + s / ) ν = ( s / ) /( 1) ( s / ( 1) 2 2 2 2 2 2 2 + ) / If the samples are of equal size, the this reduces to ν = ( 1)( s + s ) ( s ) ( 2 2 2 2 2 2 2 + s) ad if the sample ν = 2( 1) 2 variaces are both equal to, s this is October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 8

Cofidece Itervals ad Tests for a Differece of Meas s / + s / 2 2 = t ν A cofidece iterval for the differece i meas is ± t s + s 2 2 ( ) να, /2 / / If is large, we ca simply use z ν ( s / + s / ) α /2 2 2 2 = 2 2 2 2 ( s / ) /( 1) + ( s /, otherwise we use ) /( 1) ou may eed this formula for the homework, but ot o a exam. It is calculated for you by MATLAB ad other statistical software. So you wo't usually eed it i practic e. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 9

Small-Sample Itervals/Tests for a Differece of Meas This iterval/test is called the Welch-Satterthwaite iterval/test, or the Welch iterval/test, or just the two-sample t-iterval/test. It is based o a approximatio, ot o exact theory, but it is a very good approximatio. There is aother iterval/test which is exact whe the two populatio variaces are equal. This should probably ever be used. The Welch iterval/test performs almost as well as the equal variace iterval/test whe the variaces truly are equal. The equal variace iterval/test ca be seriously wrog whe the true populatio variaces differ by a lot, particularly if the sample sizes are also disparate. So the equal variace iterval/test is a bad gamble. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 10

Effect of C-ABC o the Stregth of Egieered Cartilage We have measuremets i kpa of the tesile stregth of pieces of egieered cartilage either with or without the additio of a ezyme called Chodroitiase ABC (C- ABC). The 7 cotrol samples measured 364, 224, 183, 165, 163, 275, 293. The 6 treated samples measured 462, 747, 571, 599, 373, 413. How good is the evidece that the treated samples differ from the cotrols? What is a 95% cofidece iterval for the icrease i tesile stregth from addig C-ABC? October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 11

October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 12

Effect of C-ABC o the Stregth of Egieered Cartilage The = 6 treated samples measured 462, 747, 571, 599, 373, 413. 2 527.44, 138.93, 19300 = s = s = The = 7 cotrol samples measured 364, 224, 183, 165, 163, 275, 293. = 238.15, s = 75.79, s = 5744 2 The two-sample t-iterval for the differece i tesile stregth is t ν,0.025 527.44 238.15 ± 19300 / 6 + 5744 / 7 289.29 ± (2.3646)(63.54) 289.29 ± 150.25 We obtai the t percetage poit as follows: ν 2 (3217 + 820.57) (3217) / 5 + (820.57) / 6 = 2 2 ν = 7 t ν,0.025 ν,0.025 = 2.3646 = 7.47 ν = 7.47 t = 2.3348 you eed to roud dow if you use a table ot a computer October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 13

Effect of C-ABC o the Stregth of Egieered Cartilage The = 6 treated samples measured 462, 747, 571, 599, 373, 413. = s = s = 2 527.44, 138.93, 19300 The = 7 cotrol samples measured 364, 224, 183, 165, 163, 275, 293. = 238.15, s = 75.79, s = 5744 2 The t-statistic to test for a differece is 527.44 238.15 289.29 = = 4.553 19300 / 6 + 5744 / 7 63.54 2 (3217 + 820.57) ν = = 7.47 2 2 (3217) /5+(820.57) / 6 ν = 7 p = 0.0026 ν = 7.47 p = 0.0022 you eed to roud ν dow if you use a table ot a computer October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 14

Assumptios: Idepedece The assumptio that all of the observatios are statistically idepedet ca fail i a umber of ways. Suppose the egieered cartilage is grow i the wells of 12-well microplates. If the 7 cotrol observatios were derived from two differet microplates, the observatios o the same plate may be more related tha observatios o differet oes. If the 13 total observatios occurred o plate 1 (4/3) ad plate 2 (3/3), the ad may ot be idepedet. We ca add a factor as to which plate the observatio came from. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 15

Assumptios: Normality If data are obviously skewed, the it may be better to aalyze the logs. This usually oly matters whe the data cover a wide rage, at least a factor of 5. If there is a apparet outlier, the it should be ivestigated, but omittig data because it does ot fit precoceptios is a good way to fid out thigs that ai t so. Otherwise, it is ok to proceed without a formal test of ormality. Small sample tests of ormality have almost o power. Large samples do ot require ormality. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 16

Assumptios: Equal Variace If we could assume that the populatio variace of ad the populatio variace of are the same, the there is aother versio of the t-test for a differece i meas. This is hardly ever the best optio. If σ = σ the it is slightly better. If σ σ the it ca be a lot worse, so there is little to gai ad much to lose. I small to moderate size samples, you ca t tell if the variaces differ. I large samples, we always use the form that does ot assume equal variaces ayway. Bottom lie: always use the versio of the two-sample t-test that does ot assume equality of variaces. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 17

MATLAB >> cotrol' 364.1265 223.7607 183.0584 165.3209 162.8021 274.7644 293.1934 >> treat' 461.5631 747.0499 570.8312 599.1513 373.4106 412.6264 >> [h,p,ci,stats] = ttest2(treat,cotrol,'vartype','uequal') The last two etries are ecessary sice the default i MATLAB is to assume equal variace October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 18

MATLAB >> [h,p,ci,stats] = ttest2(treat,cotrol,'vartype','uequal') h = 1 % Decisio o test of differece p = 0.0022 % p-value for the test, ull = 0 ci = 140.9449 % 95% CI for differece of meas 437.6393 stats = tstat: 4.5530 df: 7.4712 sd: [2x1 double] >> getfield(stats,'sd') % how to get the actual sample % variaces 138.9250 75.7892 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 19

Idividual Cofidece Itervals Cofidece Itervals for the Differece For each of two samples, we have a sample mea ad sample stadard deviatio. Suppose that the sample stadard deviatio is 10 i both cases ad that the sample size is 25 i both case. s x ± 1.96s / x 1 1 1 1 ± 3.92 x ± 1.96 s / x 2 2 2 2 ± 3.92 The itervals just touch if the two sample meas are separated by 2(3.92) = 7.84 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 20

Idividual Cofidece Itervals Cofidece Itervals for the Differece x x 1 2 ± 3.92 ± 3.92 The itervals just touch if the two sample meas are separated by 2(3.92) = 7.84 x x ± 1.96 s / + s / x x 2 2 1 2 1 1 2 2 x ± 1.96 8 1 2 x ± 5.54 1 2 But for a hypothetical differece to be withi the 95% cofidece limits, it has to be less tha 5.54. So, you ca't examie the possible values of differeces by makig a CI for each oe. If the two CI's do't overlap, the the meas are statistically differet. If the CI's overlap, the the meas may be statistically differet or ot. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 21

Differece of Proportios Suppose we have two methods of iducig stem cell properties i fibroblasts, the stadard method, ad a method. I a experimet, 5 out of 200 cells were coverted usig the stadard method ad 12 out of 250 were coverted usig the method. We wat to provide a 95% cofidece iterval for the differece i proportios coverted. We wat to test the hypothesis that the method is the same as the old oe. E( pˆ ) = p E( pˆ ) = p V stadard stadard ( pˆ stadard ) = p stadard (1 p stadard ) / stadard V( pˆ ) = p (1 p ) / October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 22

A test of the hypothesis of equality of proportios is give z = where ( pˆ pˆ pˆ (1 pˆ )(1 / + 1 / ) x + xstadard 17 pˆ pooled = = = 0.03778 + 450 pˆ stadard ) 0 pooled pooled stadard stadard pˆ = 5/ 200 12 / 250 = 0.023 stadard (0.03778)(0.96222)(1/ 200 + 1/ 250) = 0.01809 0.023 / 0.01809 = 1.2716 The p-value of the test is p = 2(0.1018) = 0.2035 This is cosistet with the proportios beig equal. by October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 23

Uder the ull, the two proportios are equal we have 200 + 250 = 450 observatios all with the same ukow value of p. of those, 5 + 12 = 17 were successfully coverted, so pˆ pooled = x + x + stadard stadard 17 = = 0.03778 450 Uder the ull, both samples come from the same biomial distributio. This is differet from the cofidece iterval where we are tryig to measure the differece i proportios, but do' t assume it is zero. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 24

Differece of Proportios Cofidece Iterval E( pˆ pˆ ) = p p stadard stadard p(1 p) pstadard (1 pstadard ) V( pˆ ˆ pstadard ) = + A 100(1- α)% cofidece iterval for the differece is p (1 p ) p pˆ ˆ pstadard ± zα /2 + stadard stadard (1 p ) stadard stadard pˆ (1 ˆ ) ˆ (1 ˆ p pstadard pstadard ) pˆ ˆ pstadard ± zα /2 + stadard October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 25

p(1 p) pstadard (1 pstadard ) pˆ ˆ pstadard ± zα /2 + stadard pˆ (1 ˆ ) ˆ (1 ˆ p pstadard pstadard ) pˆ ˆ pstadard ± zα /2 + stadard (12) / (250) (5) / (200) = 0.048 0.025 = 0.023 (0.048)(0.952) / 250 + ( 0.025)(0.975) / 200 = 0.01745 0.023 ± (1.960)(0.01745) = 0.023 ± 0.0342 ( 0.0112, 0.0572) October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 26

For a cofidece iterval, we use a stadard error of the differece of pˆ (1 ˆ ) ˆ (1 ˆ p pstadard pstadard ) + which does ot assume that the proportios are equal. For a hypothesis test ad use the stadard error stadard pˆ (1 pˆ )(1 / + 1 / ) pooled pooled stadard of o differece, we assume the proportios are equal where pˆ pooled = x + x + stadard stadard October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 27

Modified Cofidece Iterval for Differece of Proportios Istead of usig the actual proportios, we modify them i a similar way as with CI for oe proportio x pˆ = = + 2 x + 1 p = + 2 p (1 p ) p stadard (1 p stadard ) p p stadard ± zα /2 + stadard October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 28

Modified Cofidece Iterval for Differece of Proportios p (1 p ) p stadard (1 p stadard ) p p stadard ± zα /2 + stadard (12 + 1) / (250 + 2) (5 + 1) / (200 + 2) = 0.0516 0.025 = 0.0297 (0.0516)(0.9484) / 252 + (0.0297)(0.9703) / 202 = 0.01835 0.0297 ± (1.960)(0.01835) = 0.0297 ± 0.0360 ( 0.0063, 0.0657) compare to stadard iterval ( 0.0112, 0.0572) October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 29

Modified CI for Δp There usually wo t be much differece betwee the traditioal iterval ad the moder iterval. The moder iterval is probably slightly better, but it is ot much used. ou eed to use the moder iterval for problems i the book. I other cases, you may use whichever oe you choose. CI for oe p, add 2 to x ad 4 to. CI for Δp, add 1 to each x ad 2 to each. I both cases, there are a total of 4 artificial observatios equally split betwee the two outcomes. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 30

Paired Data For a two sample test or CI, we have two idepedet samples. Sometimes, we are iterested i a differece of meas whe the ad values each are paired up, meaig associated with oe ad oly oe of the other. A drug treatmet is supposed to reduce arthritis pai. For each patiet, we have a pai measure before ad a pai measure after treatmet. The stadard ad mii Wright meters measure peak air flow. Each subject has oe measuremet with each meter, so they are paired. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 31

We have pairs of data x, y ad we wat to kow by how much meas of ad differ. d = x y i i i µ = µ µ D i i We ow have a oe-sample problem with a cofidece iterval for the mea differece. d 0 s / d = t 1 The cofidece iterval for the differece i meas is d ± t s 1, α /2 d / For the wright data, there are 17 pairs of values, mii ad stadard, which leads to 17 differeces (mii stadard) d = 2.117 s s t d d 16 = 38.77 / 17 = 38.77 / 4.123 = 9.402 = 2.117 / 9.402 = 0.225 2.117 ± (2.120)(9.402) = 2.117 ± 19.93 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 32

We have pairs of data x, y ad we wat to kow if the meas of ad differ. d = x y H i i i µ = µ µ D 0 : µ = 0 We ow have a oe-sample problem with a ull hypothesis that the mea is 0. s d 0 d D / = t 1 i i For th e wright data, there are 17 pairs of values, mii ad stadard, which leads to 17 differeces (mii stadard) d = 2.117 s s t d d 16 = 38.77 / 17 = 38.77 / 4.123 = 9.402 = 2.117 / 9.402 = 0.225 p = 0.8246 No sigificat differece October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 33

Paired Data vs. Idepedet Samples Oe group of metabolic sydrome patiets is give a real-time glucose moitorig device, ad aother has mothly glucose tolerace tests. At the ed of the study, oe year after it starts, the diabetic status of each patiet is assessed. These are idepedet samples. A group of patiets has their diabetic status assessed ad are the give a real-time glucose moitorig device. The chage i status is determied for each patiet. These are paired data. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 34

Paired Data vs. Idepedet Samples I paired data, there are measuremets like before/after. Each experimetal uit/subject/patiet has two measuremets, oe of type 1 ad oe of type 2 such as before/after. We subtract the two measuremets (type 2) (type 1) ad make a CI of these differeces. Paired data, we have a cofidece iterval for the mea differece. I idepedet samples we have a cofidece iterval for the differece of meas. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 35

Differece of Variaces Sometimes we wat to kow if the populatio variace of two samples is differet. There is a stadard test for this determied by takig the ratio of the sample variaces. Uder ormality, the sample variace is a multiple of a chi-squared distributio. If both samples are ormal, ad they are statistically idepedet, the the ratio of the variaces has a F- distributio with degrees of freedom ( 1 1, 2 1) This test is very sesitive to ormality (ulike the t- test), so must be used with great cautio. October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 36

If the populatio is ormal, ad ot otherwise, the the sample variace has distributio s ~ σ χ /( 1) 2 2 2 1 2 2 Es ( ) = σ V( s ) = σ (2( 1)) / ( 1) = 2 σ / ( 1) 2 4 2 4 BUT ONL UNDER NORMALIT OF THE POPULATION s / s ~ F( 1, 1) 2 2 1 2 1 2 Table A.6 or the MATLAB commad vartest2 V(cotrol) = 5,744 V(treatmet) = 19,300 F(5, 6) = 19,300/5,744 = 3.3601 p = 0.1723 95% CI (0.56, 23.45) October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 37

[h p ci stats]= vartest2(treat,cotrol) h = 0 p = 0.1723 ci = 0.5612 23.4455 stats = fstat: 3.3601 df1: 5 df2: 6 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 38